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Abstract 

Abduction  u  inference  to  the  best  explanation.  In  the  TACITUS  project  at  SRI  we 
have  developed  an  approach  to  abductive  infoence,  called  ‘freighted  abduction” ,  that 
has  resulted  in  a  significant  aimplification  of  how  the  problem  of  interpreting  texts 
is  conceptualised.  The  interpretation  of  a  tot  is  the  minimal  explanation  of  why 
the  text  would  be  true.  More  precisely,  to  intapret  a  text,  one  must  prove  the  logical 
form  of  the  text  from  what  is  already  mutually  known,  allowing  for  coercions,  merging 
redundancies  where  possible,  and  making  assumptions  where  necessary.  It  is  shown 
how  such  “local  pragmatics”  problems  as  reference  resolution,  the  interpretation  of 
compound  nominals,  the  resolution  of  syntactic  ambiguity  and  metonymy,  and  schema 
recognition  can  be  solved  in  this  manner.  Moreover,  this  ^>proach  of  “interpretation 
as  abduction”  can  be  combined  with  the  older  view  of  “parnng  as  deduction”  to 
produce  an  elegant  and  thorough  integration  of  syntax,  semantics,  and  pragmatics,  one 
that  spaas  the  range  of  linguistic  phenomena  from  phonology  to  discourse  structure. 
Finally,  we  discuss  means  for  making  the  abduction  process  eflicient,  possibilities  for 
extending  the  approach  to  other  pragmatics  phenomena,  and  the  semantics  of  the 
weights  and  costs  in  the  abduction  scheme. 
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1  Introduction 


Abductive  inference  is  inference  to  the  best  explanation.  The  process  of  interpreting 
sentences  in  discourse  can  be  viewed  as  the  process  of  providing  the  best  explanation  of 
why  the  sentences  would  be  true.  In  the  TACITUS  Project  at  SRI,  we  have  developed  a 
scheme  for  abductive  inference  that  yields  a  significant  simplification  in  the  description  of 
such  interpretation  processes  and  a  significant  extension  of  the  range  of  phenomena  that 
can  be  captured.  It  has  been  implemented  in  the  TACITUS  System  (Hobbs,  1986;  Hobbs 
and  Martin,  1987;  Hobbs  et  al.,  1991)  and  has  been  or  is  being  used  to  solve  a  variety  of 
interpretation  problems  in  several  kinds  of  messages,  including  equipment  failure  reports, 
naval  operations  reports,  and  terrorist  reports. 

It  is  a  commonplace  that  people  understand  discourse  so  well  because  they  know 
so  much.  Accordinj^y,  the  aim  of  the  TACITUS  Project  has  been  to  investigate  how 
knowledge  is  used  in  the  interpretation  of  discourse.  This  has  involved  building  a  large 
knowledge  base  of  commonsense  and  domain  knowledge  (see  Hobbs  et  al.,  1987),  and 
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developing  procedures  for  using  this  knowledge  for  the  interpretation  of  discourse.  In  the 
latter  effort,  we  have  concentrated  on  problems  in  “local  pragmatics”,  specifically,  the 
problems  of  reference  resolution,  the  interpretation  of  compound  nominals,  the  resolution 
of  some  kinds  of  syntactic  and  lexical  ambiguity,  and  metonymy  resolution.  Our  approach 
to  these  problems  is  the  focus  of  the  first  part  of  this  article.  We  apply  it  to  other 
phenomena  in  the  later  parts  of  the  article. 

In  the  framework  we  have  developed,  what  the  interpretation  of  a  sentence  is  can  be 
described  very  concisely: 

To  interpret  a  sentence: 

(1)  Prove  the  logical  form  of  the  sentence, 

tc^ether  with  the  constraints  that  predicates  impose  on  thtir  arguments, 
allowing  for  coercions, 

Merging  redundancies  where  possible. 

Making  assumptions  where  necessary. 

By  the  first  line  we  mean  “prove,  or  derive  in  the  logical  sense,  from  the  predicate  calcu¬ 
lus  axioms  in  the  knowledge  base,  the  logical  form  that  has  been  produced  by  syntactic 
analysis  and  semantic  translation  of  the  sentence.” 

In  a  discourse  situation,  the  speaker  and  hearer  both  have  thdr  sets  of  private  beliefs, 
and  there  is  a  large  overlapping  set  of  mutual  beliefs.  (See  Figure  1.)  An  utterance  lives  on 
the  boundary  between  mutual  belief  and  the  speaker’s  private  beliefs.  It  is  a  bid  to  extend 
the  area  of  mutual  belief  to  include  some  private  beliefs  of  the  speaker’s.^  It  is  anchored 
referentially  in  mutual  belief,  and  when  we  succeed  in  proving  the  logical  form  and  the 
constraints,  we  are  recognizing  this  referential  anchor.  This  is  the  ^ven  information,  the 
definite,  the  presupposed.  Where  it  is  necessary  to  make  assumptions,  the  information 
comes  from  the  speaker’s  private  beliefs,  and  hence  is  the  new  information,  the  indefinite, 
the  asserted.  Merging  redundancies  is  a  way  of  getting  a  minimal,  and  hence  a  best, 
interpretation.^ 

Consider  a  simple  example. 

(2)  The  Boston  office  called. 

’Thia  is  dearett  in  the  case  of  anertions.  Bnt  qneationa  and  commands  can  also  be  conceived  of  as 
primarily  conveying  infonnation — abont  the  speaker's  wishes.  In  any  case,  most  of  what  is  required  to 
interpret  the  three  sentences, 

John  called  the  Boston  office. 

Did  John  call  the  Boston  office? 

John,  caU  the  Boston  office. 

is  the  same. 

^Interpreting  indirect  speedi  acts,  snch  as  *lt’s  cold  in  here,”  meaning  “dose  the  window,”  is  not  a 
oonnterezample  to  the  prindple  that  the  minimal  interpretation  is  the  best  interpretation,  bnt  rather  can 
be  seen  as  a  matter  of  achieving  the  minimal  interpretation  coherent  with  the  interests  of  the  speaker. 
More  on  this  in  Section  8.2. 
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Figure  1:  The  Discourse  Situation 

This  sentence  poses  at  least  three  local  pragmatics  problems,  the  problems  of  resolving  the 
reference  of  **the  Boston  office*,  expanding  the  metonymy  to  "[Scnn^  person  at]  the  Boston 
office  called*,  and  determining  the  implicit  relation  between  Boston  and  the  cAce.  Let  ns 
put  these  problems  aside  for  the  moment,  however,  and  interpret  the  sentence  according  to 
characterization  (1).  We  must  prove  abductively  the  logical  form  of  the  sentence  together 
with  the  constraint  *call*  imposes  on  its  agent,  allowing  for  a  coercion.  That  is,  we  must 
prove  abductively  the  expression  (ignoring  tense  and  some  other  complexities) 

(3)  (3ar,p,s,e)eo//'(e,ar)  A  per»on(x)  A  p)  A  oj5ice(jf)  A  Bogton{z) 

Ann(s,y) 

That  is,  there  is  a  calling  event  e  by  z  where  z  is  a  person,  z  may  or  may  not  be  the  same 
as  the  explicit  subject  of  the  sentence,  but  it  is  at  least  related  to  it,  or  coercible  from 
it,  represented  by  re/(z,  p).  p  is  an  office  and  it  bears  some  unspediied  relation  nn  to  r 
which  is  Boston,  person(z)  is  the  requirement  that  ealV  imposes  on  its  agent  z. 

The  sentence  can  be  interpreted  with  respect  to  a  knowledge  base  of  mutual  knowledge^ 
that  contains  the  following  facts: 

Bo8ion(Bi ) 

that  is,  B\  is  the  dty  of  Boston. 

office(Oi)  A  in(Oi,Bi) 
that  is,  Oi  is  an  office  and  is  in  Boston. 
person(Ji ) 

*Tkr(Mighoat  this  utide  it  will  be  aMsmed  that  aQ  axioou  aic  mataally  kaowa  by  tbe  speaker  aad 
kearer,  that  they  aw  part  of  the  ooauDM  coltaral  backgrotad 
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that  is,  John  J\  is  a  person. 
work-for{J\,0\) 

that  is,  John  J\  works  for  the  office  Oi. 

(Vy,2)m(y,z)  D  nn(2,y) 

that  is,  if  y  is  in  2,  then  z  and  y  are  in  a  possible  compound  nominal  relation. 

(Vz,y)u;or*-/or(a:,y)  D  re/(z,y) 

that  is,  if  X  works  for  y,  then  y  can  be  coerced  into  x. 

The  proof  of  all  of  (3)  is  straightforward  except  for  the  conjunct  call\x).  Hence,  we 
assume  that;  it  is  the  new  information  conveyed  by  the  sentence. 

This  interpretation  is  illustrated  in  the  proof  graph  of  Figure  2,  where  a  rectangle  is 
drawn  around  the  assumed  literal  call\e,  x).  Such  proof  graphs  play  the  same  role  in  inter¬ 
pretation  as  parse  trees  play  in  syntactic  analysis.  They  are  pictures  of  the  interpretations, 
and  we  will  see  a  number  of  such  diagrams  in  this  paper. 

Logical  Form: 
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Figure  2:  Interpretation  of  *‘The  Boston  office  called.” 
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Now  notice  that  the  three  local  pragmatics  problems  have  been  solved  as  a  by-product. 
We  have  resolved  ‘*the  Boston  office”  to  0\.  We  have  determined  the  implicit  relation  in 
the  compound  nominal  to  be  in.  And  we  have  expanded  the  metonymy  to  “John,  who 
works  for  the  Boston  office,  called.” 

In  the  remainder  of  the  article,  we  develop  this  basic  idea  in  a  variety  of  ways.  In 
Section  2,  we  give  a  high-level  overview  of  the  TACITUS  system,  in  which  this  method 
of  interpretation  is  implemented.  In  Section  3,  we  justify  the  first  clause  of  characteriza¬ 
tion  (1)  by  showing  in  a  more  detiuled  fashion  that  solving  local  pragmatics  problems  is 
equivalent  to  proving  the  lo^cal  form  plus  the  constraints.  In  Section  4,  we  justify  the 
last  two  clauses  by  describing  our  scheme  of  abductive  inference.  In  Section  5  we  present 
a  number  of  examples  of  the  use  of  the  method  for  solving  local  pragmatics  problems. 

In  Section  6  we  show  how  the  idea  of  interpretation  as  abduction  can  be  combined 
with  the  older  idea  of  parsing  as  deduction  to  yield  a  thorough  and  elegant  integration  of 
syntax,  semantics,  and  pragmatics.  In  Section  7  we  discuss  rdiated  work.  In  Section  8  we 
discuss  three  kinds  of  future  directions — ^improving  the  efficiency,  extending  the  coverage, 
and  devising  a  principled  semantics  for  the  numbers  in  the  abduction  scheme. 

2  The  TACITUS  System 

TACITUS  stands  for  The  Abductive  Commonsense  Idference  Text  Understanding  System. 
It  is  intended  for  processing  messages  and  other  texts  for  a  variety  of  purposes,  induding 
message  routing  and  prioritizing,  problem  monitoring,  and  database  entry  and  diagnosis 
on  the  basis  of  the  information  in  the  texts.  It  has  been  used  for  three  implications  so  far: 

1.  Equipment  failure  reports  or  casualty  reports  (casreps).  These  are  short,  tdegraphic 
messages  about  breakdowns  in  machinery.  The  application  is  to  perform  a  diagnosis 
on  the  basis  of  the  information  in  the  message. 

2.  Naval  operation  reports  (opreps).  These  are  tdegraphic  messages  about  ships  at¬ 
tacking  other  ships,  of  from  one  to  ten  sentences,  eadi  of  from  one  to  thirty  words, 
generated  in  the  midst  of  naval  exercises.  There  are  frequent  misspellings  and  uses 
of  jargon,  and  there  are  more  sentence  fragments  than  grammatical  sentences.  The 
application  is  to  produce  database  entries  saying  who  did  what  to  whom,  with  what 
instrument,  when,  where,  and  with  what  result. 

3.  Newspaper  artides  and  similar  texts  on  terrorist  activities.  The  ^plication  is  again 
to  produce  database  entries.  The  texts  range  from  a  third  of  a  page  to  a  page  and  a 
half.  The  sentences  average  27  words,  but  sentences  of  80  words  and  mcne  ate  by  no 
means  unusual.  The  topics  talked  about  in  these  texts  range  over  much  of  human 
activity,  so  that  althou^  the  task  is  narrowly  constrained,  the  texts  are  not. 

Ib  give  the  reader  a  concrete  sense  of  these  applications,  we  give  an  example  of  the 
input  and  output  of  the  system  for  a  rdativdy  short  terrorist  report,  dated  March  30, 
1989. 
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A  cargo  train  running  from  Lima  to  Lorohia  was  derailed  before  dawn  today 
after  hitting  a  dynamite  charge. 

Inspector  Eulo^o  Flores  died  in  the  explosion. 

The  police  reported  that  the  incident  took  plai%  past  midnight  in  the  Carahuaichi- 
Jaurin  area. 

Some  of  the  corresponding  database  entries  are  as  follows: 

Incident:  Date  30  Mar  89 

Incident:  Location  Pern:  Carahuaichi-Janrin  (area) 

Incident:  Tsrpe  Bombing 

Physical  Ihrget:  Description  "cargo  train” 

Physical  Ihrget:  Effect  Some  Damage:  "cargo  train” 

Human  Target:  Name  "Enlo^o  Flores” 

Human  Target:  Description  "inspector”:  ”Enlogio  Flores” 

Human  Target:  Effect  Death:  "Enlogio  Flores” 

It  must  be  determined  that  hitting  a  dynamite  charge  constitutes  a  bombings  that  the 
physical  target  was  the  cargo  trun  that  hit  the  charge,  and  that  derailing  constitutes 
damage.  It  must  also  be  determined  that  the  explosion  was  the  one  that  resulted  from 
hitting  the  dynamite  charge,  and  hence  Eulogio  Flores  is  a  human  target  in  the  incident. 
The  definite  noun  phrase  "the  incident”  must  be  resolved  to  the  hitting  of  the  dynamite 
charge  for  the  location  to  be  recogmzed. 

The  system,  as  it  is  presently  constructed,  consists  of  three  components:  the  syntactic 
analysis  and  semantic  translation  component,  the  pragmatics  component,  and  the  task 
component.  How  the  pragmatics  component  works  is  the  topic  of  Sections  3, 4,  and  8.1. 
Here  we  describe  the  other  two  components  very  briefly. 

The  syntactic  analysis  and  semantic  translation  is  done  by  the  DIALOGIC  system. 
DIALOGIC  includes  a  large  grammar  of  English  that  was  constructed  in  1980  and  1981 
essentially  by  merpng  the  DIAGRAM  grammar  of  Robinson  (1982)  with  the  Linguistic 
String  Project  grammar  of  Sager  (1981),  includii^  semantic  translators  for  all  the  rules.  It 
has  since  undergone  further  development.  Its  coverage  encompasses  all  of  the  major  syn¬ 
tactic  structures  of  English,  including  sentential  complements,  adverbials,  relative  clauses, 
and  the  most  common  conjunction  constructions.  SdecUonal  constraints  can  be  encoded 
and  applied  in  either  a  hard  mode  that  rejects  parses  or  in  a  soft  mode  that  orders  parses. 
A  list  of  possible  intrar  and  inter-sentential  antecedents  for  pronouns  is  produced,  ordered 
by  syntactic  criteria.  There  are  a  number  of  heuristics  for  (vdering  parses  oa  the  basis 
of  syntactic  criteria  (Hobbs  and  Bear,  1990).  Optionally,  the  system  can  produce  neu¬ 
tral  representations  for  the  most  common  cases  of  structural  ambiguity  (Bear  and  Hobbs, 
1988).  DIALOGIC  produces  a  logical  form  for  the  sentence  in  an  ontriogically  promis¬ 
cuous  version  of  first-order  predicate  calculus  (Hobbs,  1985a),  encoding  everything  that 
can  be  determined  by  purely  syntactic  means,  without  recourse  to  the  context  or  to  world 
knowledge. 

This  initial  logical  form  is  passed  to  the  pragmatics  component,  which  works  as  de¬ 
scribed  below,  to  produce  an  elaborated  logical  form,  making  explicit  the  inferences  and 
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assumptions  required  for  interpreting  the  text  and  the  coreference  relations  that  are  dis¬ 
covered  in  interpretation. 

On  the  basis  of  the  information  in  the  elaborated  logical  form,  the  task  component 
produces  the  required  output,  for  example,  the  diagnosis  or  the  database  entries.  The 
task  component  is  generally  fairly  small  because  all  of  the  relevant  information  has  been 
made  explicit  by  the  pragmatics  component.  Thsk  components  can  be  programmed  in  a 
schema-spedfication  language  that  is  a  slight  extension  of  first-order  predicate  calculus 
(Tyson  and  Hobbs,  1990). 

TACITUS  is  intended  to  be  largely  domain-  and  application-independent.  The  lexicon 
used  by  DIALOGIC  and  the  knowledge  base  used  by  the  pragmatics  component  must  of 
course  vary  from  domain  to  domsun,  but  the  graunmar  itself  and  the  pragmatics  procedure 
do  not  vary  from  one  domain  to  the  next.  The  task  component  varies  from  application  to 
application,  but  the  use  of  the  schema-specification  language  can  make  even  this  compo¬ 
nent  largely  domain-independent. 

A  detailed  analysis  of  the  performance  of  the  system  and  its  various  components  is 
given  in  Hobbs  et  al.  (1991). 

The  modular  organization  of  the  system  into  syntax,  pragmatics,  and  task  is  undercut 
in  Section  6.  There  we  propose  a  unified  framework  that  incorporates  all  three  modules. 
The  framework  has  been  implemented,  however,  only  in  a  prdiminary  experimental  man¬ 
ner,  doe  to  the  effort  involved  in  duplicating  the  coverage  of  the  DIALOGIC  grammar  in 
the  new  framework. 

3  Solving  Local  Pragmatics  Problems  as  Abductive  Infer¬ 
ence 

3.1  A  Notational  Convention 

Before  we  proceed,  we  need  to  introduce  a  notational  convention  (that  we  have  in  fact 
already  used).  We  will  take  p(x)  to  mean  that  p  is  true  of  x,  and  p'(e,  x)  to  mean  that  e 
is  the  eventuality  or  possible  situation  of  p  bmng  true  of  x.  This  eventuality  may  or  may 
not  exist  in  the  real  world.  The  unprimed  and  primed  predicates  are  related  by  the  axiom 
schema 

(Vx)p(x)  s  (3e)j/(e,x)  A  ffextsts(e) 

where  Rexists(e)  says  that  the  eventuality  e  does  in  fact  really  exist.  Existential  quan¬ 
tification  by  itsdf  only  guarantees  existence  in  a  Platonic  universe  of  posuble  entities. 
This  notation,  by  reifying  events  and  conditions,  provides  a  way  of  specifying  higher-order 
properties  in  first-order  logic.  This  Davidsonian  reification  of  eventualities  (Davidson, 
1967)  is  a  common  device  in  AI.  See  Hobbs  (1985a,  1985b)  for  further  explanation  of  the 
spedfic  notation  and  ontdogical  assumptions. 

Often  asdoms  that  intnitivdy  ought  to  be  written  as 

(Vx)p(x)  D  q(z) 
will  be  written 
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(Vei.xyCci,!)  D  (3e2)9'(c2,x) 

That  is,  if  is  the  eventuality  of  p  being  true  of  x,  then  there  is  an  eventuality  62  of  q 
being  true  of  x.  It  will  sometimes  be  convenient  to  state  this  in  a  stronger  form.  It  is  not 
just  that  if  ei  exists,  then  happens  to  exist  as  well.  The  eventuality  €2  exists  by  virtue 
of  the  fact  that  ei  exists.  Let  us  express  this  tight  connection  by  the  predicate  gen,  for 
‘‘generates”.  Then  the  above  axiom  can  be  strengthened  to 

(Vei,x)p'(ei,x)  D  (3e2)?'(e2i*)  A  pen(ei,<2) 

Not  only  is  there  an  62,  but  there  an  £2  by  virture  of  the  fact  that  there  is  an  ci.  The 
relative  existential  and  modal  statuses  of  ci  and  62  can  then  be  axiomatized  in  terms  of 
the  predicate  gen. 

3.2  An  Example 

The  following  “sentence”  from  an  equipment  failure  report  illustrates  four  local  pragmatics 
problems. 

(4)  Disengaged  compressor  after  lube-oil  alarm. 

Identifying  the  compressor  and  the  alarm  are  reference  resolution  problems.  Determin¬ 
ing  the  implicit  relation  between  “lube-oil”  and  “alarm”  is  the  problem  of  compound 
nominal  interpretation.  Dedding  whether  “after  lube-dl  alarm”  modifies  the  compres¬ 
sor  or  the  disengaging  is  a  problem  in  syntactic  ambiguity  resolution.  The  preposition 
“after”  requires  an  event  or  condition  as  its  object  and  this  forces  us  to  coerce  “lube-oil 
alarm”  into  “the  sotmding  of  the  lube-oil  alarm”;  this  is  an  example  of  metonymy  res¬ 
olution.  We  wish  to  show  that  solving  the  first  three  of  these  problems  amounts  to 
deriving  the  logical  form  of  the  sentence.  Solving  the  fourth  amounts  to  deriving  the  con¬ 
straints  predicates  impose  on  thdr  arguments,  allowing  for  coercions.  Thus,  to  solve  all  of 
them  is  to  interpret  them  acceding  to  characterisation  (1).  For  each  of  these  problems, 
our  approach  is  to  frame  a  lo^cal  expression  whose  derivation,  or  proof,  constitutes  an 
interpretation. 

Reference:  To  rescdve  the  reference  of  “compressor”  in  sentence  (4),  we  need  to  prove 
(constructively)  the  following  logical  expression: 

(5)  (3  e)eompres3or{c) 

If,  for  example,  we  prove  this  expression  by  using  axioms  that  say  Ci  is  a  “starting  air 
compressor”,^  and  that  a  starting  air  compressor  is  a  compressor,  then  we  have  resolved 
the  reference  of  “compressor”  to  Ci. 

In  general,  we  would  expect  defimte  noun  phrases  to  refer  to  entities  the  hearer  already 
knows  about  and  can  identify,  and  indefinite  noun  phrases  to  refer  to  new  entities  the 
speaker  is  introducing.  However,  in  the  casualty  reports  most  noun  phrases  have  no 
determiners.  There  are  sentences,  such  as 

*Tk«t  it,  t  comprettOT  for  the  air  ued  to  start  the  aUp’t  saa  tatbiae  eagUMs. 
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Retained  oil  sample  and  filter  element  for  fntare  analysis. 

where  “sample”  is  indefinite,  or  new  information,  and  “filter  element”  is  definite,  or  already 
known  to  the  hearer.  In  this  case,  we  try  to  prove  the  existence  of  both  the  sample  and 
the  filter.  When  we  fail  to  prove  the  e^dstence  of  the  sample,  we  know  that  it  is  new,  and 
we  simply  assume  its  existence. 

Elements  in  a  sentence  other  than  nominals  can  also  function  referentially.  In 
Alarm  sounded. 

Alarm  activated  during  routine  start  of  compressor. 

one  can  argue  that  the  activation  is  the  same  as,  or  at  least  implicit  in,  the  sounding. 
Hence,  in  addition  to  trying  to  derive  expressions  such  as  (5)  for  nominal  reference,  for 
possible  non-nominal  reference  we  try  to  prove  similar  expressions. 

(B  ...e,a,...)...  A  iuiivaie!{e,a)  A  ... 

That  is,  we  wish  to  derive  the  existence,  from  background  knowledge  or  the  previous  text, 
of  some  known  or  implied  activation.  Most,  but  certainly  not  all,  information  conveyed 
non-nominally  is  new,  and  hence  will  be  assumed  by  means  described  in  Section  4. 

Compound  Nominals:  To  resolve  the  reference  of  the  noun  phrase  “lube-oil  alarm”, 
we  need  to  find  two  entities  o  and  a  with  the  appropriate  properties.  The  entity  o  must 
be  lube  oil,  a  must  be  an  alarm,  and  there  must  be  some  implicit  relation  between  them. 
If  we  call  that  implicit  relation  nn,  then  the  expression  that  must  be  proved  is 

(3o,a,nn)l«6e-ot/(o)  A  Ciiarm[a)  A  nn(o,a) 

In  the  proof,  instantiating  nn  amounts  to  interpreting  the  implicit  rdation  between  the 
two  nouns  in  the  compound  nominal.  Compound  nominal  interpretation  is  thus  just  a 
special  case  of  reference  resolution. 

'^eating  nn  as  a  predicate  variable  in  this  way  assumes  that  the  relation  between  the 
two  nouns  can  be  anything,  and  there  are  good  reasons  for  bdieving  this  to  be  the  case 
(e.g..  Downing,  1977).  In  “lube-ml  alarm”,  for  example,  the  relation  is 

\x,  y  [y  sounds  when  the  pressure  of  x  drops  too  low] 

However,  in  our  implementation  we  use  a  first-order  simulation  of  this  approach.  The 
symbol  nn  is  treated  as  a  predicate  constant,  and  the  most  common  possible  relations  (see 
Levi,  1978)  are  encoded  in  axioms.  The  amom 

(Vz,y)port(y,x)  D  nn(x,y) 

allows  interpretation  of  compound  nominals  of  the  form  “<whole>  <part>”,  such  as 
“filter  dement”.  Axioms  of  the  form 

(V®,y)somp/e(y,x)  D  nn(x,y) 

handle  the  very  common  case  in  which  the  head  noun  is  a  rdational  noun  and  the  prenom- 
inal  noun  fills  one  of  its  rdes,  as  in  “dl  sample”.  Complex  rdations  such  as  the  one  in 
“lube-oil  alarm”  can  sometimes  be  glossed  as  “for”. 


(V*,y)/or(y,i)  D  nn(x,y) 

Syntactic  Ambiguity:  Some  of  the  most  common  types  of  syntactic  ambigoity,  in¬ 
cluding  prepositional  phrase  and  other  attachment  ambiguities  and  very  compound  nom¬ 
inal  ambiguities^,  can  be  converted  into  constrained  coreference  problems  (see  Bear  and 
Hobbs,  1988).  For  example,  in  (4)  the  first  argument  of  after  is  taken  to  be  an  existentially 
quantified  vairiable  which  is  equal  to  either  the  compressor  or  the  disengaging  event.  The 
lo^cal  form  would  thus  include 

(3  ...e,c,y,a,...)...  Aa/eer(y,a)  A  y  €  {c,e}  A  ... 

That  is,  no  matter  how  aftef(y,  a)  is  proved  or  assumed,  y  must  be  equal  to  either  the 
compressor  c  or  the  disenga^ng  e.  This  kind  of  ambiguity  is  often  solved  as  a  by-product 
of  the  resolution  of  metonymy  or  of  the  merging  of  redundancies. 

Metonymy:  Predicates  impose  constraints  on  their  arguments  that  are  often  violated. 
When  they  are  violated,  the  arguments  must  be  coerced  into  something  related  that  sat¬ 
isfies  the  constraints.  This  is  the  process  of  metonymy  resolution.^  Let  ns  suppose,  for 
example,  that  in  sentence  (4),  the  predicate  after  requires  its  arguments  to  be  events: 

after(ei,e3)  :event(ei)  A  eventiej) 

To  allow  for  coercions,  the  logical  form  of  the  sentence  is  altered  by  replacing  the  explicit 
arguments  by  “coercion  variables”  which  satisfy  the  constraints  and  which  are  related 
somehow  to  the  explicit  arguments.  Thus  the  altered  logical  form  for  (4)  would  include 

(3  ...fci,fc2»y»Oi»*eli,rel2,...)...  Aa/t«r(fci,fc2)  A  event(fci)  A  refi(Jbi,y) 
AeveTU(k2)  A  reli{k2,a)  A  ... 

Here,  ki  and  k2  are  the  coercion  variables,  and  the  after  relation  obtains  between  them, 
rather  than  between  y  and  o.  ki  and  k2  are  both  events,  and  ki  and  k2  are  coercible  from 
y  and  a,  respectively.  The  coercion  relati<ms  reli  and  re/2  ittay,  of  course,  be  identity,  in 
which  case  there  is  no  metonymy. 

As  in  the  most  general  approach  to  compound  nominal  interpretation,  this  treatment 
is  second-order,  and  suggests  that  any  relation  at  all  can  hdd  between  the  implicit  and 
explicit  arguments.  Nnnberg  (1978),  among  others,  has  in  fact  argued  just  this  point. 
However,  in  our  implementation,  we  are  uung  a  first-order  simulation,  rel  is  treated  as 
a  predicate  constant,  and  there  are  a  number  of  axioms  that  specify  what  the  possible 
coercions  are.  Identity  is  one  possible  relation,  since  the  explicit  arguments  could  in  fact 
satisfy  the  constraints: 

(Vx)rel(x,x) 

*A  very  oompoud  aomiaal  it  t  ttriag  of  two  or  more  Boaat  j^eoediag  t  ketd  aou,  at  ia  "Staafnd 
Reaearcb  laatitate”.  The  aabigaity  they  poee  it  whether  the  frtt  aoaa  it  tahea  to  atodify  the  aecoad  or 
the  third. 

*There  are  other  iaterpretive  movet  ia  thit  ataatioa  betidet  metoMjrmic  iaterpretatioa,  tad  at 
meti^thoric  iaterpretatioa.  We  will  coafiae  oaradvat  here  to  BMtoByny,  however. 
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In  general,  where  this  works,  it  will  lead  to  the  best  interpretation.  We  can  also  coerce 
from  a  whole  to  a  part  and  from  an  object  to  its  function.  Hence, 

(V®,y)part(i,p)  D  re/(z,y) 

(Vi,c)/«nct*on(e,z)  D  re/(c,i) 

Putting  It  All  Together:  Patting  it  all  tc^ether,  we  find  that  to  solve  all  the  local 
pragmatics  problems  posed  by  sentence  (4),  we  must  derive  the  following  expression: 

(3e,z,c,jbi,k2,y,a,o)Past(e)  A  disengag^(e,x,c)  A  eompressor(e) 

Aafter(kifk2)  A  event{k\)  A  rel{ki,y)  Aye  {c,e} 

Aevent{k2)  A  rel(k2,a)  A  a/arm(a)  A  nn(o,a)  A  lvbt~oil{o) 

Bat  this  is  jast  the  lopcal  form  of  the  sentence^  together  with  the  constraints  that  predi¬ 
cates  impose  on  their  argaments,  allowing  for  coercions.  That  is,  it  is  the  first  half  of  oar 
characterization  (1)  of  what  it  is  to  interpret  a  sentence. 

When  parts  of  this  expression  cannot  be  derived,  assamptions  mast  be  made,  and  these 
assamptions  are  taken  to  be  the  new  information.  The  likelihood  that  different  conjnncts 
in  this  expression  will  be  new  information  varies  according  to  how  the  information  is 
presented,  lingoistically.  The  main  verb  is  more  likely  to  convey  -new  information  than  a 
definite  noan  phrase.  Thns,  we  assign  a  cost  to  each  6f  the  conjnncts — the  cost  of  assnming 
that  conjunct.  This  cost  is  expressed  in  the  same  currency  in  which  other  factors  involved 
in  the  ^goodness'*  of  an  interpretation  are  expressed;  among  these  factors  are  likely  to 
be  the  length  of  the  proofs  used  and  the  salience  of  the  axioms  they  rely  on.  Since  a 
definite  noun  phrase  is  generally  used  referentially,  an  interpretation  that  simply  assumes 
the  existence  of  the  referent  and  thus  fails  to  identify  it  should  be  an  expensive  one.  It 
is  therefore  given  a  high  assumability  cost.  For  purposes  of  concreteness,  let’s  just  call 
this  $10.  Indefinite  noun  phrases  are  not  usually  used  referentially,  so  they  are  given  a 
low  cost,  say,  $1.  Bare  noun  phrases  are  fpven  an  intermediate  cost,  say,  $5.  Propositions 
presented  non-nominally  are  usually  new  information,  so  th^  are  given  a  low  cost,  say, 
$3.  One  does  not  usually  use  selectional  constraints  to  convey  new  information,  so  they 
are  given  the  same  cost  as  definite  noun  phrases.  Coercion  relations  and  the  compound 
nominal  relations  are  given  a  very  high  cost,  say  $20,  since  to  assume  them  is  to  fail  to 
solve  the  interpretation  problem.  If  we  place  the  assumability  costs  as  superscripts  on 
their  conjnncts  in  the  above  lo^cal  form,  we  get  the  following  expression: 

(3e,z,c,ki,k3,y,a,b)Pasf(e)^  A  d*senyoye'(e,i,c)*®  A  compressor(c)*® 
Aafter{k\,k2)*^  '  eren<(fci)**®  A  re/(ki,y)**®  A  y  €  {c,e}  A 
A  re/(ka»  o)**®  a  Jk»arm(a)**  A  nn(o,  c)**°  A  /ti6e-o»/(o)*® 

While  this  example  gives  a  rough  idea  of  the  relative  assumability  costs,  the  real  costs 
must  mesh  well  with  the  inference  processes  and  thns  must  be  determined  experimentally. 
The  use  of  numbers  here  and  throughout  the  next  section  constitutes  one  possible  regime 
with  the  needed  properties.  This  issue  is  addressed  more  fully  in  Section  8.3. 

^For  jutificatioD  ibi  this  kind  of  logical  form  for  Kateacet  with  qnaatilien  aad  iateasioaal  operators, 
see  Hobbs(1983b,  1985a). 
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4  Weighted  Abduction 

4.1  The  Method 

In  deduction,  from  (Vz)p(z)  D  ^(z)  and  p(A),  one  concludes  q{A).  In  induction,  from 
p(i4)  and  9(A),  or  more  likely,  from  a  number  of  instances  of  p{A)  and  q{A),  one  concludes 
(Vz)p(z)  D  9(z).  Abduction  is  the  third  possibility.  FVom  (Vz)p(z)  D  9(z)  and  q(A), 
one  concludes  p(A).  One  can  think  of  q(A)  as  the  observable  evidence,  of  (V z)p(z)  D  q(x) 
as  a  general  principle  that  could  explain  9(A)*s  occurrence,  and  of  p(A)  as  the  inferred, 
underlying  cause  or  explanation  of  q(A).  Of  couree,  this  mode  of  inference  is  not  valid; 
there  may  be  many  possible  such  p(A)’s.  Therefore,  other  criteria  are  needed  to  choose 
among  the  possibilities. 

One  obvious  criterion  is  the  consistency  of  p(A)  with  the  rest  of  what  one  knows.  Two 
other  criteria  are  what  Thagard  (1978)  has  called  simplicity  and  consilience.  Roughly, 
simplicity  is  that  p(A)  should  be  as  small  as  possible,  and  consilience  is  that  9(A)  should 
be  as  big  as  possible.  We  want  to  get  more  bang  for  the  buck,  where  9(A)  is  bang,  and 
p(A)  is  buck. 

There  is  a  property  of  natural  language  discourse,  noticed  by  a  number  of  linguists 
(e.g.,  Joos,  1972;  Wilks,  1972),  that  suggests  a  role  for  simplicity  and  consilience  in 
interpretation — its  high  degree  of  redundancy.  Consider 

Inspection  of  oil  filter  revealed  metal  particles. 

An  inspection  is  a  looking  at  that  causes  one  to  team  a  property  relevant  to  the  function 
of  the  inspected  object.  The  function  of  a  filter  is  to  capture  particles  from  a  fluid.  To 
reveal  is  to  couse  one  to  learn.  If  we  assume  the  two  causings  to  learn  are  identical, 
the  two  sets  of  particles  are  identical,  and  the  two  fimetions  are  identical,  then  we  have 
explained  the  sentence  in  a  minimal  fashion.  Because  we  have  exploited  this  redundancy,  a 
small  number  of  inferences  and  assumptions  (umplidty)  have  explained  a  large  number  of 
syntactically  independent  propositions  in  the  sentmee  (consilience).  As  a  by-product,  we 
have  moreover  shown  that  the  inspector  is  the  one  to  whom  the  particles  are  revealed  and 
that  the  particles  are  in  the  filter,  facts  which  are  not  explicitly  conveyed  by  the  sentence. 

Another  issue  that  arises  in  abduction  in  choosing  among  potential  explanations  is 
what  might  be  called  the  "informativeness-correctness  tradeoST.  Many  previous  uses  of 
abduction  in  A1  from  a  theorem-proving  perspective  have  been  in  diagnostic  reasoning 
(e.g.,  Pople,  1973;  Cox  and  Pietrzykowski,  1986),  and  they  have  assumed  "most-specific 
abduction”.  If  we  wish  to  explain  chest  pains,  it  is  not  snflldent  to  assume  the  cause  is 
simply  chest  pains.  We  want  something  more  spedfle,  such  as  "pneumonia”.  We  want 
the  most  spedfic  possible  explanation.  In  natural  language  processing,  however,  we  often 
want  the  least  spedfle  assumption.  If  there  is  a  mention  of  a  Add,  we  do  not  necessarily 
want  to  assume  it  is  lube  oil.  Assuming  simply  the  existence  of  a  fluid  may  be  the  best 
we  can  do.*  However,  if  there  is  corroborating  evidence,  we  may  want  to  make  a  more 
spedfle  assumption.  In 

Alarm  sounded.  Flow  obstructed. 

*  As  Firead  is  pvrportcd  to  have  said,  “SometiinM  a  dgar  is  jist  a  dgai.” 
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we  know  the  alarm  is  for  the  lube  oil  pressure,  and  this  provides  evidence  that  the  flow 
is  not  merely  of  a  fluid  but  of  lube  oil.  The  more  specific  our  assumptions  are,  the  more 
informative  our  interpretation  is.  The  less  spedflc  they  are,  the  more  likely  they  are  to  be 
correct. 

We  therefore  need  a  scheme  of  abductive  inference  with  three  features.  First,  it  should 
be  possible  for  goal  expressions  to  be  assumable,  at  varying  costs.  Second,  there  should  be 
the  possibility  of  making  assumptions  at  various  levels  of  specificity.  Third,  there  should 
be  a  way  of  exploiting  the  natural  redundancy  of  texts  to  yield  more  economic  proofs. 

We  have  devised  just  such  an  abduction  scheme.^  First,  every  conjunct  in  the  lo^cal 
form  of  the  sentence  is  given  an  assumability  cost,  as  described  at  the  end  of  Section  3. 
Second,  this  cost  is  passed  back  to  the  antecedents  in  Horn  clauses  by  assigning  weights 
to  them.  Axioms  are  stated  in  the  form 

(6)  Fp  A  Pp  D  Q 

This  says  that  Pj  and  P^  imply  Q,  but  also  that  if  the  cost  of  assuming  Q  is  c,  then  the 
cost  of  assuming  Pi  is  wic,  and  the  cost  of  assuming  P2  is  W2C.^°  Third,  factoring  or 
synthesis  is  allowed.  That  is,  goal  expressions  may  be  unified,  in  which  case  the  resulting 
expression  is  given  the  smaller  of  the  costs  of  the  input  expressions.  Thus,  if  the  goal 
expression  is  of  the  form 

(3  ...,i,y,...)...  A  g(x)  A  ...  A  q(y)  A  ... 

where  ^(x)  costs  $20  and  g(y)  costs  $10,  then  factoring  assumes  x  and  y  to  be  identical 
and  yields  an  expression  of  the  form 

(3  ...,x, ...)...  A  g(x)  A  ... 

where  ^(x)  costs  $10.  This  feature  leads  to  minimality  through  the  exploitation  of  redun¬ 
dancy. 

Note  that  in  (6),  if  wi  +  W2  <  1,  most-specific  abduction  is  favored — why  assume 
Q  when  it  is  cheaper  to  assume  Pi  and  P2.  If  isi  -{-  u;2  >  1,  least-specific  abduction  is 
favored — why  assume  Pi  and  P2  when  it  is  cheaper  to  assume  Q.  But  in 

Pi«  A  Pj*  D  g 

if  Pi  has  already  been  derived,  it  is  cheaper  to  assume  P2  than  Q.  Pi  has  provided  evidence 
for  Q,  and  assuming  the  "balance”  P2  of  the  necessary  evidence  for  Q  should  be  cheaper. 
Factoring  can  also  override  least-specific  abduction.  Suppose  we  have  the  axioms 

Pf  A  Pj"  3  0. 

'The  abduction  icheme  it  due  to  Mark  Stickd,  and  it,  or  a  variant  of  it,  it  detcribed  at  greater  length 
in  Sticke)  (1989). 

“’Stickel  (1989)  generaliset  the  weighta  to  arbitrary  fanctiona  of  c. 
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and  we  wish  to  derive  Qi  A  Q2,  where  each  conjunct  has  an  assumability  cost  of  $10. 
Assuming  Qi  A  Q2  will  then  cost  $20,  whereas  assuming  Pi  A  P2  A  P3  will  cost  only 
$18,  since  the  two  instances  of  P2  can  be  unified.  Thus,  the  abduction  scheme  allows  us 
to  adopt  the  careful  policy  of  favoring  least-specific  abduction  while  also  allowing  us  to 
exploit  the  redundancy  of  texts  for  more  specific  interpretations. 

Finally,  we  should  note  that  whenever  an  assumption  is  made,  it  first  must  be  checked 
for  consistency.  Problems  associated  with  this  requirement  are  discussed  in  Section  8.1. 

In  the  above  examples  we  have  used  equal  wmghts  on  the  conjuncts  in  the  antecedents. 
It  is  more  reasonable,  however,  to  assign  the  weights  according  to  the  "semantic  contribu¬ 
tion”  each  conjunct  makes  to  the  consequent.  Consider,  for  example,  the  axiom 

(V  x)car(a;)-*  A  no-iop(xy*  D  eonveriible(x) 

We  have  an  intuitive  sense  that  car  contributes  more  to  convertible  than  no-top  does.  We 
are  more  likely  to  assume  something  is  a  convertible  if  we  know  that  it  is  a  car  than  if 
we  know  it  has  no  top.^^  The  wmghts  on  the  conjuncts  in  the  antecedent  are  adjusted 
accordingly. 

Exactly  hov’  the  weights  and  costs  should  be  assigned  is  a  matter  of  continuing  research. 
Our  experience  so  far  suggests  that  which  interpretation  is  chosen  is  sensitive  to  whether 
the  weights  add  up  to  more  or  less  than  one,  but  that  otherwise  the  system’s  performance 
is  fairly  impervious  to  small  changes  in  the  values  of  the  weights  and  costs.  In  Section 
8.1,  there  is  some  farther  discussion  about  the  uses  the  numbers  can  be  put  to  in  making 
the  abduction  procedure  more  efficient,  and  in  Section  8.3,  there  is  a  discussion  of  the 
semantics  of  the  numbers. 

4.2  **Et  Cetera**  Propositions  and  the  Form  of  Axioms 

In  the  abductive  approach  to  interpretation,  we  determine  what  implies  the  logical  form 
of  the  sentence  rather  than  determining  what  can  be  inferred  from  it.  We  backward-chain 
rather  than  forward-chain.  Thus,  one  would  think  that  we  could  not  use  superset  infor¬ 
mation  in  processing  the  sentence.  Since  we  are  backward-chaining  from  the  propositions 
in  the  logical  form,  the  fact  that,  say,  lube  oil  is  a  fluid,  which  would  be  expressed  as 

(7)  (V  x)lube-oil(x)  D  fluid{x) 

could  not  play  a  role  in  the  analysis  of  a  sentence  containing  "lube  dl”.  This  is  inconve¬ 
nient.  In  the  text 

Flow  obstructed.  Metal  particles  in  lube  dl  Alter. 

we  know  from  the  first  sentence  that  there  is  a  fluid.  We  would  like  to  identify  it  with  the 
lube  dl  mentioned  in  the  second  sentence.  In  interpreting  the  second  sentence,  we  must 
prove  the  expression 

(3z)/ti6e-oti(z) 

*'To  prime  tU>  iatnition,  ims^ae  two  doom.  Behiad  oae  is  a  csi.  Behiad  the  other  is  sometkiag  wiUi 
BO  top.  Yob  pidc  s  door.  U  there’s  *  coBvertible  behiod  it,  yoa  get  to  keep  it.  Which  door  woold  roe  pidc? 
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If  we  had  as  an  axiom 

(Vi)//«*d(*)  D  lvbe-oH{x) 

then  we  could  establish  the  identity.  But  of  course  we  don’t  have  such  an  axiom,  for  it 
isn’t  true.  There  are  lots  of  other  kinds  of  fluids.  There  would  seem  to  be  no  way  to  use 
superset  information  in  our  scheme. 

Fortunately,  however,  there  is  a  way.  We  can  make  use  of  this  information  by  converting 
the  axiom  to  a  biconditional.  In  general,  axioms  of  the  form 

species  D  genus 

can  be  converted  into  biconditional  axioms  of  the  form 
genus  A  differentiae  =  species 

Often  as  in  the  above  example,  we  will  not  be  able  to  prove  the  differentiae,  and  in  many 
cases  the  differentiae  cannot  even  be  spelled  out.  But  in  our  abductive  scheme,  this  does 
not  matter;  they  can  simply  be  assumed.  In  fact,  we  need  not  state  them  explicitly.  We 
can  simply  introduce  a  predicate  which  stands'  for  all  the  remaining  properties.  It  wiU 
never  be  provable,  but  it  will  be  assumable.  Thus,  we  can  rewrite  (7)  as 

(7a)  (Vi)//t»td(i)’®  A  etex{x)'*  =  /tt6e*ot7(x) 

Then  the  fact  that  something  is  fluid  can  be  used  as  evidence  for  its  being  lube  oil,  since 
we  can  assume  etc\{x).  With  the  weights  distributed  according  to  semantic  contribution, 
we  can  go  to  extremes  and  use  an  axiom  like 

(VT)matnma/(x)‘^  A  e<C3(*)  *  D  elephant{x) 

to  allow  ns  to  use  the  fact  that  something  is  a  mammal  as  (weak)  evidence  for  its  being 
an  elephant.  This  amom  can  be  taken  to  say,  ‘‘One  way  of  bring  a  mammal  is  bring  an 
elephant.” 

Although  this  device  may  seem  ad  hoc,  we  view  it  as  implementing  a  fairly  general 
solution  to  the  problems  of  nonmonotonicity  in  commonsense  reasoning  and  vagueness 
of  meaning  in  natural  language.  The  use  of  "et  cetera”  propositions  is  a  very  powerful, 
and  liberating,  device.  Before  we  hit  upon  this  device,  in  our  attempts  at  axiomatizing  a 
domain  in  a  way  that  would  accommodate  many  texts,  we  were  always  "arrow  hacking” — 
trying  to  figure  out  which  way  the  implication  had  to  go  if  we  were  to  get  the  right 
interpretations,  and  lamenting  when  that  made  no  semantic  sense.  With  "et  cetera” 
predications,  that  problem  went  away,  and  for  principled  reasons.  Implicative  relations 
could  be  used  in  either  direction.  Moreover,  their  use  is  liberating  when  constructing 
axioms  for  a  knowledge  base.  It  is  well-known  that  almost  no  concept  can  be  defined 
predsriy.  We  are  now  able  to  come  as  close  to  a  definition  as  we  can  and  introduce  an  "et 
cetera”  proposition  with  an  appropriate  weight  to  indicate  how  far  short  we  feel  we  have 
fallen. 

The  "et  cetera”  propositions  play  a  role  analogous  to  the  abnormality  propositions  of 
drcnmscriptive  logic  (McCarthy,  1987).  In  drcumscriptive  theories  it  is  usual  to  write 
axioms  like 
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(Vx)Wrrf(a:)  A  ->i46i(x)  D  flies{x) 

This  certainly  looks  like  the  axiom 

(Vx)Wrrf(x)  A  ctC3(x)’"  D  flies(x) 

The  literal  ->Abi{x)  says  that  x  is  not  abnormal  in  some  particnlar  respect.  The  literal 
elC3(x)  says  that  x  possesses  certain  imspecified  properties,  for  example,  that  x  is  not 
abnormal  in  that  same  respect.  In  circmnsciiption,  one  minimizes  over  the  abnormality 
predicates,  assuming  they  are  false  wherever  possible,  perhaps  with  a  partial  ordering  on 
abnormality  predicates  to  determine  which  assumptions  to  select  (e.g.,  Poole,  1989).  Our 
abduction  scheme  generalizes  this  a  bit:  The  literal  etC3(x)  may  be  assumed  if  no  contra¬ 
diction  results  and  if  the  resulting  proof  is  the  most  economical  one  available.  Moreover, 
the  “et  cetera”  predicates  can  be  used  for  any  land  of  differentiae  distinguishing  a  species 
from  the  rest  of  a  genus,  and  not  just  for  those  related  to  normality. 

There  is  no  particular  difficulty  in  specifying  a  semantics  for  the  “et  cetera”  predicates. 
Formally,  etc^  in  axiom  (7a)  can  be  taken  to  denote  the  set  of  all  things  that  either  are  not 
fluid  or  are  lube  oil.  Intuitively,  etci  conveys  all  the  information  one  would  need  to  know 
beyond  fluidness  to  conclude  that  something  is  lobe  oil.  As  with  nearly  every  predicate 
in  an  axiomatization  of  commonsense  knowledge,  it  is  hopeless  to  spell  out  necessary  and 
sufficient  conditions  for  an  "et  cetera”  predicate.  In  fact,  the  use  of  such  predicates  is 
motivated  largely  by  a  recognition  of  this  fact  about  commonsense  knowledge. 

The  “et  cetera”  predicates  could  be  used  as  the  abnormality  predicates  are  in  cir¬ 
cumscriptive  logic,  with  separate  axioms  spelling  out  conditions  under  which  they  would 
hold.  However,  in  the  view  adopted  here,  more  detailed  conditions  would  be  spelled  oat 
by  expanding  axioms  of  the  form 

(Vx)pi(x)  A  etc4(x)  D  q{x) 
to  axioms  of  the  form 

(Vx)pi(x)  A  p2(x)  A  e<C5(x)  D  q{x) 

where  the  weight  on  eics(x)  would  be  less  than  that  <m  etciix).  An  “et  cetera”  predicate 
would  appear  only  in  the  antecedent  of  a  single  axiom  and  never  in  a  consequent.  Thus, 
the  "et  cetera”  predications  are  only  place-holders  for  assumption  costs.  They  are  never 
proved.  They  are  only  assumed. 

Lrt  us  summarize  at  this  pdnt  the  most  daborate  form  axioms  in  the  knowledge  base 
will  have.  If  we  wish  to  express  an  implicative  rdation  between  concepts  p  and  q,  the  most 
natural  way  to  do  so  is  as  the  axiom 

(Vx,x)p(x,x)  D  (3ir)q(x,ir) 

where  z  and  p  stand  for  arguments  that  occur  in  one  predication  but  not  in  the  other. 
When  we  introduce  eventualities,  this  axiom  becomes 

(V€i,x,x)p'(ei,x,x)  D  (3e2,y)?'(ea,*,y) 
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Using  the  gen  relation  to  express  the  tight  connection  between  the  two  eventualities,  the 
axiom  becomes 

(Vci,x,^)p'(ei,a:,«)  D  (3e2,y)5'(e2,*,y)  A  pcn(ei,e2) 

Next  we  introduce  an  "et  cetera”  proposition  into  the  antecedent  to  take  care  of  the 
imprecision  of  our  knowledge  of  the  implicative  relation. 

(Vci,i,z)p'(ei,i,2)  A  etciix,2)  D  (3e2,y)9'(c2,x,»)  A  pen(ei,C2) 

Finally  we  biconditionalize  the  relation  between  p  and  q  by  writing  the  converse  axiom  as 
well: 


(yei,x,2)f/{eux,2)  A  c/ci(a:,z)  D  (3e2,yV(c2,x,y)  A  yen(ei,e2) 
(Vei,*,yy(e2,*,y)  A  etei{x,y)  D  (3ei,x)p'(ei,i,x)  A  yen(e2,ei) 

This  then  is  the  most  general  formal  expression  in  our  abdnctive  logic  of  what  is  intuitively 
felt  to  be  an  association  between  the  concepts  p  and  q. 

In  this  article,  for  notational  convenience,  we  wiU  use  the  simplest  form  of  axiom  we 
can  get  away  with  for  the  example.  The  reader  should  keep  in  mind  however  that  these 
are  only  abbreviations  for  the  full,  biconditionalized  form  of  the  axiom.*^ 

5  Some  Local  Pragmatics  Phenomena 

5.1  Definite  Reference 

The  following  four  examples  are  sometimes  taken  to  illustrate  four  different  kinds  of  definite 
reference.*® 

I  bought  a  new  car  last  week.  The  car  is  already  giving  me  trouble. 

I  bought  a  new  car  last  week.  The  vehide  is  already  giving  me  trouble. 

I  bought  a  new  car  last  week.  The  engine  is  already  giving  me  trouble. 

The  engine  of  my  new  car  is  already  ^ving  me  trouble. 

In  the  first  example,  the  same  word  is  used  in  the  definite  noun  phrase  as  in  its  antecedent. 
In  the  second  example,  a  hyponym  is  used.  In  the  third  example,  the  reference  is  not  to 
the  "antecedent”  but  to  an  object  that  is  rdated  to  it,  requiring  what  Clark  (1975)  has 
called  a  "bridging  inference”.  The  fourth  example  is  a  determinative  definite  noun  phrase, 
rather  than  an  anaphoric  one;  all  the  information  required  for  its  resolution  is  found  in 
the  noun  phrase  itself. 

’’The  foil  axioms  an  aoB-Hoca,  bat  aot  aerioi^y  so.  They  caa  be  Sholeiniaed  and  broken  into  two 
axioms  having  the  same  Skolem  functions.  This  remark  holds  as  wdl  fcr  other  in  this  artide  that 

have  coajnnctions  in  the  consequent. 

**In  all  the  examples  of  Se^on  5,  we  will  ignore  weights  and  costs,  show  the  path  to  the  correct 
interpretation,  and  assume  the  weights  and  costs  ate  such  that  this  interpretatioa  will  be  dioaen.  A  great 
deal  of  theoretical  and  empirical  research  will  be  required  before  this  will  happen  in  fact,  espedally  in  a 
eystem  with  a  very  large  knowledge  base. 
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These  distinctions  are  insignificant  in  the  abductive  approach.  In  each  case  we  need 
to  prove  the  existence  of  the  definite  entity.  In  the  first  example  it  is  immediate.  In  the 
second,  we  use  the  axiom 

(V*)car(i)  D  vehicle{x) 

In  the  third  example,  we  use  the  axiom 

(Vi)cor(i)  D  (3y)engine(y,x) 

that  is,  cars  have  engines.  In  the  fourth  example,  we  use  the  same  axiom,  but  after 
assuming  the  existence  of  the  speaker’s  new  car. 

The  determiner  ’’the”  indicates  that  the  entity  is  the  most  salient,  mutually  identifiable 
entity  of  that  description.  In  this  article,  we  deal  with  this  fact  by  pving  the  corresponding 
propositions  in  the  lopcal  form  high  assumption  costs  to  force  resolution  and  depending 
on  the  minimal  cost  proof  to  find  the  most  salient  appropriate  entity.  A  more  principled 
approach  would  take  seriously  the  information  presented  by  the  determiner  ’Hhe”,  viewing 
it  as  a  relation  between  the  entity  referred  to  and  the  description  provided  by  the  rest  of 
the  noun  phrase,  axiomatizingit  in  terms  of  mutual  knowledge  and  the  discourse  situation, 
and  taking  it  as  a  proposition  in  the  logical  form  to  be  proved. 

5.2  Distinguishing  the  Given  and  the  New 

Next  let  us  examine  four  successively  more  difficult  definite  reference  problems  in  which 
the  given  and  the  new  information  are  intertwined  and  must  be  separated.  The  first  is 

Retained  sample  and  filter  element. 

Here  ’^sample”  is  new  information.  It  was  not  known  before  this  sentence  in  the  message 
that  a  sample  was  taken.  The  *ffilter  dement”,  on  the  other  hand,  is  given  information. 

It  is  already  known  that  the  compressor’s  lube  oil  system  has  a  filter,  and  that  a  filter  has 
a  filter  dement  as  one  of  its  parts.  These  facts  are  represented  in  the  knowledge  base  by 
the  axioms 

filter{F) 

(V/)/tlter(/)  D  (3fe)filteT-element(fe)  A  partife^f) 

Noon  phrase  conjunction  is  represented  by  the  predicate  andn.  The  expresdon  andn{x,  s,  fe) 
says  that  x  is  the  typical  dement  of  the  set  consisting  of  the  dements  s  and  fe.  Typi¬ 
cal  dements  can  be  thought  of  as  rdfied  universally  quantified  variables.  Roughly,  thdr 
properties  are  inherited  by  the  dements  of  the  set.  (See  Hobbs,  1983b.)  An  axiom  of  pairs 
says  that  a  set  can  be  formed  out  of  any  two  dements: 

(V  s,  /e)(3  *)andn(*,  s,  fe) 

The  logical  form  for  the  sentence  is,  roughly, 

(3  e,  y,  X,  s,  /e)retain'(e,  y,  x)  A  aiufn(x,s,  fe)  A  sample(s)  A  fitter-element{fe) 
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That  is,  y  retained  x  where  x  is  the  typical  element  of  a  set  consisting  of  a  sample  s 
and  a  filter  element  fe.  Let  us  suppose  we  have  no  metonymy  problems  here.  Then 
interpretation  is  simply  a  matter  of  deriving  this  expression.  We  can  prove  the  existence 
of  the  filter  element  from  the  existence  of  the  filter  F.  We  cannot  prove  the  existence  of  the 
sample  s,  so  we  assume  it.  It  is  thus  new  information.  Given  s  and  /e,  the  axiom  of  pairs 
gives  us  the  existence  of  x  and  the  truth  of  an(fn(x,  s,  fe).  We  cannot  prove  the  existence 
of  the  retaining  e,  so  we  assume  it;  it  is  likewise  new  information.  TUs  interpretation  is 
illustrated  in  Figure  3. 

Logical  Form: 


retatn'(e,y,x)  Aandn(x,s,/e)  A  samp/e(s)  A  filter-element{fe) 


Knowledge  Base: 


andn(x,s,fe) 


filter{f)  D  filter-element(fe)  A  part{fe,f) 

/ 


fater(F) 


Figure  3:  Interpretation  of  "Retained  sample  and  filter  element.’’ 

In  the  next  example  new  and  old  information  about  the  same  entity  are  encoded  in  a 
single  noun  phrase. 

There  was  adequate  lube  oil. 

We  know  about  the  lube  oil  already,  and  there  is  a  corresponding  axiom  in  the  knowledge 
base. 


lvhe-oil{0) 

Its  adequacy  is  new  information,  however.  It  is  what  the  sentence  is  tdling  ns. 

The  logical  form  of  the  sentence  is,  roughly, 

(3  o)/u6e-ot/(o)  A  adequate{p) 

This  is  the  expression  that  must  be  derived.  The  proof  of  the  existoice  of  the  lube  dl 
is  immediate.  It  is  thus  old  information.  The  adequacy  cannot  be  proved  and  is  hence 
assumed  as  new  information. 

The  next  example  is  from  Clark  (1975),  and  illustrates  what  happens  when  the  ^ven 
and  new  information  are  combined  into  a  single  lexical  item: 
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John  walked  into  the  room. 

The  chandelier  shone  brightly. 

What  chandelier  is  being  referred  to? 

Let  us  suppose  we  have  in  our  knowledge  base  the  fact  that  rooms  have  lights: 

(8)  (Vr)room(r)  D  (3 /)/t^hf(/)  A  *n(/,r) 

Suppose  we  also  have  the  fact  that  lighting  fixtures  with  several  branches  are  chandeliers: 

(9)  (il)light{l)  A  has-branches{l)  D  chandtlier{l) 

The  first  sentence  has  pven  us  the  existence  of  a  room — room(£).  To  solve  the  definite 
reference  problem  in  the  second  sentence,  we  must  prove  the  existence  of  a  chandelier. 
Back-chaining  on  asdom  (9),  we  see  we  need  to  prove  the  existence  of  a  light  with  branches. 
Back-chaining  from  light{l)  in  axiom  (8),  we  see  we  need  to  prove  the  existence  of  a  room. 
We  have  this  in  room{R).  To  complete  the  derivation,  we  assume  the  light  I  has  branches. 
The  light  is  thus  given  by  the  room  mentioned  in  the  previous  sentence,  while  the  fact  that 
it  has  several  branches  is  new  information.  Tlus  interpretation  is  illustrated  in  Figure  4. 

Logical  Form: 


...  A  eha7uielier{x)  A 


Knowledge  Base: 


light{l)  A  I  ha*-branehes{l)  |  O  ehandelier{l) 

t 


room(r)  O  Hghi{l)  A  tn(/,r) 

t 

room(^) 


Figure  4:  Interpretation  of  *The  chanddier  . . 

Note  that  it  is  not  enough  merely  to  assume  the  existence  of  the  chandelier,  nnce  that 
would  not  connect  it  with  the  room. 

This  example  may  seem  to  have  an  unnatural,  pseudo-literary  quality.  There  are 
similar  examples,  however,  which  are  completely  natural.  Connder 

I  saw  my  doctor  last  week. 

He  told  me  to  get  more  exercise. 

Who  does  **he’’  in  the  second  sentence  refer  to? 

Suppose  in  our  knowledge  base  we  have  axioms  encoding  the  fact  that  a  doctor  is  a 
person, 
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(10)  (V  d)doctor(d)  D  person(d) 

and  the  fact  that  a  male  person  is  a  ‘*he’’, 

(11)  (V  d)person(d)  A  male(d)  D  he(d) 

To  solve  the  reference  problem,  we  must  derive 
(3d)he(d) 

Back-chaining  on  axioms  (11)  and  (10),  matching  with  the  doctor  mentioned  in  the  first 
sentence,  and  assuming  the  new  information  male(d)  pves  ns  a  derivation.*^ 

5.3  Lexical  Ambiguity 

The  treatment  of  lexical  ambiguity  is  reasonably  straightforward  in  our  framework,  adopt¬ 
ing  an  approach  advocated  by  Hobbs  (1982a)  and  similar  to  the  “pdaroid  word”  method 
of  Hirst  (1987).  The  ambiguous  word  “bank”  has  a  corresponding  predicate  bank  which  is 
true  of  both  financial  institutions  and  the  banks  of  rivers.  There  are  two  other  predicates, 
banki  true  of  financial  institutions  and  bank}  true  of  banks  of  rivers.  The  .three  predicates 
are  related  by  the  two  axioms 

(Vx)banki(x)  D  bank(x) 

(^x)bank}(x)  D  bank(x) 

AU  world  knowledge  is  then  expressed  in  terms  of  either  banki  or  bank},  not  in  terms  of 
hank.  In  interpreting  the  text,  we  use  one  or  the  other  of  the  axioms  to  reach  into  the 
knowledge  base,  and  whichever  one  we  use  determines  the  intended  sense  of  the  word. 
Where  these  axioms  are  not  used,  it  is  apparently  because  the  best  interpretation  of  the 
text  did  not  require  the  resolution  of  the  lexical  ambiguity. 

Consider  the  example 

John  wanted  a  loan.  He  went  to  the  bank. 

Suppose  the  knowledge  base  contains  the  two  axioms  above  as  weU  as  the  fdlowing  axioms: 

(Vp)Ioan(p)  D  (3  x)/inaneial~instituti<m(x)  A  is3ve(x,p) 
that  is,  loans  are  issued  by  financial  institutions. 

(Vx)/inancial-insiituti<m(x)  A  etei(x)  D  bankj(x) 
that  is,  one  way  kind  of  financial  institution  is  a  bank]. 

(Vz)rioer(r)  D  bank}(x)  A  horder»{x,x) 

**Sexists  will  fiad  this  example  man  oonpeUiag  if  they  sabetitate  Vke”  lor  “he*. 
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that  is,  a  river  has  a  bank}  that  borders  it. 

The  proof  of  the  proposition  bank{y)  in  the  logical  form  will  use  the  predicates  banki 
and  financial-institution  and  ground  out  at  loan{L)  from  the  interpretation  of  the  first 
sentence,  and  the  ambiguity  will  thereby  be  resolved.  (This  interpretation  is  iUustrated 
in  Figure  5.)  Of  course  one  can  construct  a  context  in  which  “bank”  is  resolved  the  other 
way,  but  what  one  is  doing  in  constructing  such  a  context  is  modifying  the  knowledge 
base,  the  salience  of  the  axioms,  and  the  surrounding  discourse  so  that  the  minimum-cost 
proof  of  the  whole  text  will  be  something  else. 

Logicid  Form: 


Knowledge  Base: 


. . .  A  bank{x)  A  . . . 


banki{x)  D  bank{x) 


\ 


finaneial-institution{x)  A  €tc\{x)  3  banki{x) 

\ 

loan(y)  D  financial-institution(x)  A  issue(x,y) 

t 

loan{L) 


bank2ix)  D  bank{x) 

river{z)  3  bank^ix)  A  bordera{x,z) 

Figure  5:  Interpretation  of  “. . .  the  bank.” 

Next  let  ns  consider  an  example  from  Hirst  (1987): 

The  plane  taxied  to  the  terminal. 

Suppose  the  knowledge  base  consists  of  the  following  axioms: 

(Vz)atrplane(z)  3  pfane(x) 
or  an  airplane  is  a  plane. 

(y  x)wood-smooiher(x)  3  plane{x) 
or  a  wood  smoother  is  a  plane. 
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(y  x,y)move-on-ground{x,y)  h  airplane{x)  D  taxi{x,y) 

or  for  an  airplane  x  to  move  on  the  ground  to  y  is  for  it  to  taxi  to  y. 

(Vi,y)ride-»n-cah(x,y)  A  pers<m(*)  3  tazt(x,  y) 

or  for  a  person  x  to  ride  in  a  cab  to  y  is  for  x  to  taxi  to  y. 

(y  y)airpori-ierminal{y)  D  terminal{y) 

or  an  airport  terminal  is  a  terminal. 

(V y)computer-termtna/(y)  D  terfninal{y) 

or  a  computer  terminal  is  a  terminal. 

(V  z)atrport(x)  D  (3  x,  y)airplane(x)  A  cirport-terminal{y) 

or  airport  have  airplanes  and  airport  terminals. 

The  logical  form  of  the  sentence  will  be,  roughly, 

(3x,y)p/ane(x)  A  taxi(x,y)  A  iermin£il(y) 

The  minimal  proof  of  this  logical  form  will  involve  assuming  the  existence  of  an  airport, 
deriving  from  that  the  airplane,  and  thus  the  plane,  and  the  airport  terminal,  and  thus 
the  terminal,  and  recognizing  the  redundancy  of  the  airplane  with  one  of  the  readings  of 
*Haxi".  This  interpretation  is  illustrated  in  Figure  6. 

Hirst  solved  this  problem  by  marker  passing.  Chamiak  (1986)  pointed  out  that  marker 
passing  can  be  viewed  as  a  search  through  a  set  of  axioms  for  a  proof,  where  the  bindings 
of  variables  are  ignored.  Adopting  this  account  of  marker  passing,  our  abductive  proof 
follows  essentially  the  same  lines  as  Hirst’s  marker-passing  solution.  However,  whereas 
Hirst’s  marker  passing  is  a  largely  unmotivated  special  process  in  language  comprehension, 
our  abductive  proof  is  simply  the  way  interpretation  is  always  done. 

5.4  C!oinpound  Nominals 

In  a  compound  nominal  such  as  ‘‘turpentine  jar”,  the  logical  form  we  need  to  prove  consists 
of  three  propositions,  one  for  each  noun  and  one  for  the  relation  between  them. 

(3x,y)furpenf(ne(y)  A  jar{x)  A  nn(y,x) 

Proving  nn(y,x)  constitutes  discovering  the  implicit  relation  between  the  nouns. 

Suppose  our  knowledge  base  consists  of  the  following  axioms: 

(V  y)lt9tttd(y)  A  etci(y)  D  turpentine{y) 

or  one  kind  of  liquid  is  turpentine. 

(Vei,x,y)/ttnctton(ei,x)  A  e(mtatn'(ei,x,y)  A  liquid{y)  A  etc2(ei,x,y)  D 
jar(x) 
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Logical  Form: 


plane{x)  A  taxi(x,y)  A  terminally) 


Knowledge  Base: 


airjdane(x)  D  jdane{x) 

move-on-ground{x,  y)|  A  airfUane{x)  D  taxi(x,  y) 

airport~terminal(y)  D  terminally) 

\ 

|atrport(2)|  D  airplane{x)  A  airport-terminal{y) 


wood-smoother(x)  D  plane(x) 


ride-in-eab{x,y)  A  person(x)  D  <axt(z,y) 


eompv,ter-terminal{y)  D  termtnai(y) 

Figure  6:  Interpretation  of  “The  plane  taxied  to  the  terminal.” 

or  if  the  fonction  of  something  is  to  contain  liquid,  then  it  may  be  a  jar. 
(Vei,x,y)contoiV(ei,x,y)  0  nn(y,*) 

or  one  possible  implicit  relation  in  compound  nominals  is  the  “contains”  rdation. 

Then  the  minimal  proof  of  the  lopcal  form  will  take  the  liquid  turpentine  to  be  the 
same  as  the  liquid  implicit  in  “jar”  and  take  the  nn  rdation  to  be  the  “contains”  relation 
implicit  in  “jar”.  Thi.  is  illustrated  in  Figure  7. 

If  nn  were  taken  to  be  a  predicate  variable,  then  the  last  axiom  would  not  be  required. 
When  nn  is  taken  to  be  a  predicate  variable,  we  can  see  that  a  very  common  case  of 
compound  nominals  simply  falls  out,  namely,  where  the  head  noun  is  a  rdational  noun 
and  the  prenominal  noun  is  one  of  its  arguments.  Consider  “(dl  sample”,  and  suppose  that 
sample  is  a  two-argument  predicate,  the  sample  itself  and  the  substance  it  is  a  sample  of. 
The  lopcal  form  of  the  noun  phrase,  before  the  compound  nominal  is  interpreted,  is 

(3  X,  y,  z,  nn)ot7(y)  A  8ample(x,z)  A  nn(y,x) 

To  interpret  this  we  need  to  recognise  that  z  is  y.  But  that  is  exactly  what  will  result 
if  we  take  the  nn  relation  to  be  the  sample  rdation  itself,  unifying  y  and  z.  We  need  a 
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Logical  Form: 


turpeniine{y)  A  nn(y,x)  A  jar(x) 


Knowledge  Base: 
liquid{y)/\  eici(y) 


D  turpeniine(y) 


contain' (ei,x,y)  D  nn(y,*) 
/ 


Figure  7:  Interpretation  of  “turpentine  jar”. 


salient  relation  between  the  two  nouns,  but  the  most  salient  relation  is  the  one  provided 
by  the  head  noun. 

Another  case  of  compound  nominal  interpretation  that  can  be  seen  to  fall  out  of 
a  predicate  variable  approach  is  what  Jack  Kulas  (personal  communication)  has  called 
“referential  compound  nominals”.  Ck>nsider 

Half  the  people  will  study  the  role  of  women  in  the  early  history  of  California. 

Half  the  people  will  study  the  role  of  women  in  the  early  history  of  Texas. 

The  California  people  should  finish  their  reports  by  October  15. 

The  relation  encoded  in  the  compound  nominal  “California  people”  is 

Ax,  y  [x  will  study  the  role  of  women  in  the  early  history  of  y] 

but  this  is  exactly  the  relation  that  is  made  salioit  by  the  previous  two  sentences. 

5.5  Exploiting  Redundancy 

We  next  show  the  use  of  the  abduction  scheme  in  solving  internal  coreference  problems. 
Two  problems  rused  by  the  sentence 

The  plain  was  reduced  by  erosion  to  its  present  level. 

are  determining  what  was  eroding  and  determining  what  “it”  refers  to.  Suppose  our 
knowledge  base  consists  of  the  following  axioms; 

(V p,  /,  8)deeTease{p, /,  s)  A  oerf>ca/(s)  A  etC3(p,  /,  s)  D  (9  e,  2)red«ce'(e,  z,p,l) 

or  if  p  decreases  to  /  on  some  (real  or  metaphorical)  vertical  scale  s  (plus  some  other 
conditions),  then  there  is  an  e  which  is  a  reducing  by  something  x  of  p  to  /. 
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(Vp)/anrf/orm(p)  A  flat{p)  A  etct{p)  D  piatn(p) 

or  p  is  a  plain  if  p  is  a  flat  landform  (plus  some  other  conditions). 

(Vc,y,/,5)of'(e,y,l)  A  oii({,s)  A  vertical{s)  A  flat{y)  A  ctc5(e,y,/,s) 

D  level' {e, I,  y) 

or  e  is  the  condition  of  /’s  being  the  level  of  y  if  e  is  the  condition  of  y’s  being  at  /  on  some 
vertical  scale  s  and  y  is  flat  (plus  some  other  conditions). 

(Vx,/,s)</ecrease(x,/,s)  A  landform(x)  A  altUude(s)  A  eteeix,l,s) 

Z>  (3c)crode'(e,*) 

or  e  is  an  eroding  of  z  if  x  is  a  landform  that  decreases  to  some  point  I  on  the  altitude 
scale  s  (plus  some  other  conditions). 

(V  s)verttca/(s)  A  etcy^s)  D  aliitvde^s) 

or  s  is  the  altitude  scale  if  a  is  vertical  (plus  some  other  conditions). 

Now  the  analysis.  The  lopcal  form  of  the  sentence  is  roughly 

(3  Cl ,  €2,  p,  /,  X,  €3,  y)reduee'(ei ,  e2,  p,  /)  A  plain(p)  A  erode'(e2,  x)  A  present^ez) 
Mevel'{e3,l,y) 

Our  characterization  of  interpretation  says  that  we  must  derive  this  expression  from  the 
axioms  or  from  assumptions.  Back-chaining  on  reduce'{ei,e2,p,l)  yields 

decTeaae{p,l,a\)  A  vertical{s\)  A  etC3(p,/,si) 

Back-chaining  on  eTod^{e\,x)  yields 

decrease^x, 12,82)  A  landform{x)  A  altitude{s2)  A  etC6(x,/3,S3) 

and  back-chaining  on  altitude{a2)  in  tom  yields 

vertieal(a2)  A  €<€7(^2) 

We  onify  the  goals  deereaae(p,  I,  si)  and  decreaae(x,  I3,  sz),  and  thereby  identify  the  object 
X  of  the  erosion  with  the  plmn  p.  The  goals  vertieal(ai)  and  veriieal(a2)  also  unify,  telling 
ns  the  reduction  was  on  the  altitude  scale.  Back-chaining  on  plain(p)  yields 

la7u(form(p)  A  flat(p)  A  etC4(p) 

and  landform(x)  unifies  with  landform(p),  reinforcing  our  identification  of  the  object  of 
the  erosion  with  the  plain.  Back-chaining  on  leveV(ez,l,y)  yields 

af'(e3»y»0  A  on(/,S3)  A  vertieal^az)  A  flat{y)  A  etcz(ez,y,l,az) 
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and  vertical{s3)  and  vertieal(s2)  unify,  as  do  flai{y)  and  flat{p),  thereby  identifying 
“it”,  or  y,  as  the  plain  p.  We  have  not  written  out  the  axioms  for  this,  but  note  also  that 
“present”  implies  the  existence  of  a  change  of  level,  or  a  change  in  the  location  of  “it”  on 
a  vertical  scale,  and  a  decrease  of  a  plain  is  a  change  of  the  plain’s  location  on  a  vertical 
scale.  Unifying  these  would  provide  reinforcement  for  our  identification  of  “it”  with  the 
plain.  Now  assuming  the  most  specific  atomic  formulas  we  have  derived  including  all  the 
“et  cetera”  conditions,  we  arrive  at  an  interpretation  that  is  minimal  and  that  solves  the 
internal  coreference  problems  as  a  by-product.*^ 

This  interpretation  is  illustrated  in  Figure  8.  (The  factoring  of  two  literals  is  indicated 
by  marking  one  as  assumed  and  deriving  the  second  from  it.) 

Knowledge  Base:  Logical  Form: 


Figure  8:  Interpretation  of  “The  plain  was  reduced  by  eronon  to  its  present  level. 


5.6  The  Four  Local  Pragmatics  Problems  At  Once 

Let  us  now  return  to  the  example  of  Section  3. 

’‘Thu  example  waa  analysed  in  a  siiiiilar  manner  in  Hobbs  (1978)  bnt  not  in  sndi  a  clean  fashion,  since 
it  was  withont  benefit  of  the  abdnction  sdteme. 
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Disengaged  compressor  after  lube-oil  alarm. 

Recall  that  we  must  resolve  the  reference  of  "compressor”  and  "alarm”,  discover  the  im¬ 
plicit  relation  between  the  lube  oil  and  the  alarm,  attach  "after  alarm”  to  either  the 
compressor  or  the  disengaging,  and  expand  "after  alarm”  into  "after  the  sounding  of  the 
alarm”. 

Suppose  our  knowledge  base  includes  the  following  axioms:  There  are  a  compressor  C, 
an  alarm  A,  lube  oil  O,  and  the  pressure  P  of  the  lube  oil  0  at  A: 

compressor(C),  alarm(A),  lube-oil(0),  pres9ure{P,0,A) 

The  alarm  is  for  the  lube  oil: 

/or(i4,0) 

The  /or  relation  is  a  possible  nn  relation: 

(Va,o)/or(o,o)  D  nn(o,a) 

A  disengaging  e\  by  z  of  c  is  an  event: 

(Vex,z,c)dtsenpape'(ei,2,c)  3  eoent(ei) 

If  the  pressure  p  of  the  lube  oil  o  at  the  alarm  a  is  inadequate,  then  there  is  a  sounding 
ej  of  the  alarm,  and  that  sounding  is  the  function  of  the  alarm: 

(Va,o,p)a/arm(a)  A  lvb€-oil{o)  A  pressure{p,o,a)  A  inadeqwite(p) 

3  (3e2)so«n<f(e2,a)  A  function{e2,a) 

A  sounding  is  an  event: 

(Ve2,o)50und'(e2,a)  3  eventie^) 

An  entity  can  be  coerced  into  its  function: 

(Ve2,a)/unction(e2<a)  D 

Identity  is  a  posuble  coerdon: 

(Vz)re/(z,*) 

Finally,  we  have  axioms  encoding  set  membership: 

(Vy,s)p€  {y}  U  a 
(Vy,i,s)y  €  s  3  y  €  {*}  U  s 

Of  the  possible  metonymy  problems,  let  us  coniine  ourselves  to  one  posed  by  "after”. 
Then  the  expression  that  needs  to  be  derived  for  an  interpretation  is 

(3ei,z,c,ki,ik2,y,a,o)dMenyaye'(ei,z,c)  A  compressor(e)  A  q^tet^ki, ^2) 
Aevent{ki)  A  rel{ki,y)  A  y€  {c,ej}  A  eoent(k2)  A  rel{k2,a) 

A  alarm(a)  A  lvbe-oil{p)  A  nn(o,  a) 


28 


One  way  for  rc/(Ari,  y)  to  be  true  is  for  ki  and  y  to  be  identical.  We  can  back-chain  from 
event{k\)  to  obtain  dtsenyaye'(fei,xi,Ci).  This  can  be  merged  with  diaengage\ei,x,c), 
yielding  an  interpretation  in  which  the  attachment  y  of  the  prepositional  phrase  is  to  ‘*dis- 
engage”.  This  identification  of  y  with  ej  is  consistent  with  the  constraint  y  €  {c,ei}.  The 
conjunct  disengage' {eijX,c)  cannot  be  proved  and  must  be  assumed  as  new  information. 

The  conjuncts  compressor{c),  /u6e-ot7(o),  and  alarm{a)  can  be  proved  immediately, 
resolving  c  to  C,  o  to  0,  and  a  to  A.  The  compound  nominal  relation  nn{0,A)  is  true 
because  foT{A,0)  is  true.  One  way  for  eventiki)  to  be  true  is  for  sound'(k2,a)  to  be 
true,  and  functi<m{k2,A)  is  one  way  for  rel(k2,A)  to  be  true.  Back-chaining  on  each 
of  these  and  merging  the  results  yields  the  goals  a/arm(>4),  /tt6e‘Ot/(o),  pressure(p,  o,  A), 
and  inadequate{p).  The  first  three  of  these  can  be  derived  immediately,  thus  identifying  o 
as  O  and  pasP,  and  inadequate{P)  is  assumed.  We  have  thereby  coerced  the  alarm  into 
the  sounding  of  the  alarm,  and  as  a  by-product  we  have  drawn  the  correct  implicature — 
that  is,  we  have  assumed — that  the  lube  oil  pressure  is  inadequate.  This  interpretation  is 
illustrated  in  Figure  9. 

5.7  Schema  Recognition 

One  of  the  most  common  views  of  “understanding’'  in  artificial  intdligence  has  been  that 
to  understand  a  text  is  to  match  it  with  some  pre-existing  schema.  In  our  view,  this  is  far 
too  limited  a  notion.  But  it  is  interesting  to  note  that  this  sort  of  processing  falls  out  of 
our  abduction  method,  provided  schemas  are  expressed  as  axioms  in  the  right  way. 

Let  us  consider  an  example.  RAINFORM  messages  are  messages  about  sightings  and 
pursuits  of  enemy  submarines,  generated  during  naval  maneuvers.  A  typical  message 
might  read,  in  part, 

Visual  sighting  of  periscope  followed  by  attack  with  ASROC  and  torpedo^. 
Submarine  went  sinker. 

An  “ASROC”  is  an  air-to-surface  rocket,  and  to  go  sinker  is  to  submerge.  These  messages 
generally  follow  a  single,  rather  simple  schema.  An  enemy  sub  is  ughted  by  one  of  our 
ships.  The  sub  either  evades  our  ship  or  is  attacked.  If  it  is  attacked,  it  is  either  damaged 
or  destroyed,  or  it  escapes. 

A  somewhat  simplified  version  of  this  schema  can  be  encoded  in  an  axiom  as  follows: 

(V  Cl ,  cj,  €3,  X,  y)stdhsighting-schema{ei,  63, 63,  i,  y, ) 

D  sight' {ei,x,y)  A  friendly{z)  A  ship{x)  A  enemy{y)  A  auh(y) 

Athen{ei,e2)  A  aftacA;'(e3,x,y)  A  sub-8ighting‘Outc<nne{es,e2,x,y) 

That  is,  if  we  are  in  a  submarine-ughting  utuation,  with  all  of  its  associated  roles  ci,  x, 
y,  and  so  on,  then  a  number  of  things  are  true.  There  is  a  sighting  ei  by  a  friendly  ship 
X  of  an  enemy  sub  y.  Then  there  is  an  attack  e2  by  x  on  y,  with  some  outcome  63. 

Among  the  possible  outcomes  is  y’s  escaping  from  x,  which  we  can  express  as  follows: 

(Ve3,e2,x,y)suh-siyhtiny-oiitcome(e3,e2,x,y)  A  etc\{e3,e2fX,y) 

D  escope'(e3,y,x) 
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Knowledge  Base: 


Logical  Form: 


compressoT{C) 


disengage\ei,x,c)  D  event(ei) 


di3engage\ei,x,c) 


A  compre88or{e) 
Af 


a/ter(ei,e2) 


A  event(ei) 


re/(ei,ei). 


€  {ci}  D  Cl  €  {c}  U  {ci}. 

t 

Cl  €  {ci}  U  {} 


8ound’(e2,a)  D  event{e2)- 


functi<m(e2,a)  D  rc/(c3,a)- 


pre88ure{p,  o,  a)  A  |inadc9i(alc(p1 


A  rc/(ci,ci) 


•-  A  Cl  €  {c,Ci} 


A  cpcnt(c2) 


A  rc/(c2,a) 


Ai«6c-ot/(o)  A  aiarm(a) 

I  D  funetion{e2,tt)  A  aottnd'(c2,a) 


prcMttrc(P,0,i4)  \  a/arm(i4)- 


Ivbe-oHiP)- 


for{a,o)  D  nn(o,a), 

t 

/<><■(  A,0) 


A  o/arm(a) 


A  /ti6c-ot/(o) 


A  nn(o,a) 


Figure  9:  Interpretation  of  “Disengaged  compressor  after  lube  dl  alarm.” 
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We  express  it  here  in  this  direction  becnnse  we  will  have  to  backward-chain  from  the  escape 
to  the  outcome,  and  on  to  the  schema. 

The  other  facts  *'iat  need  to  be  encoded  are  as  follows: 

(Vy)sit6(y)  J  (3 z)p€riscope(2)  A  part{z,y) 

That  is,  a  sub  has  a  periscope  as  one  of  its  parts. 

(Vei,e2)<^cn(ci,e2)  D  /o//ou;(c2.ei) 

That  is,  if  ex  and  e^  occur  in  temporal  succession  (then),  then  e2  follows  ex. 

(Ve3,y,*)eacapc'(e3,y,*)  A  ete^ies^x^y)  D  8vbmeTge\ez,y) 

That  is,  submerging  is  one  way  of  escaping. 

(Ve3,y)s«6mcrye'(e3,y)  D  go-sinker*{e3,y) 

That  is,  submeipng  implies  going  sinker. 

In  order  to  interpret  the  first  sentence  of  the  example,  we  must  prove  its  logical  form, 
which  is,  roughly, 

(3ei,x,z,e2,t(,v,a,f)styht'(ei,x,2)  A  vi8ual(ex)  A  peri8eope{z) 

S  fMow{e3,ex)  A  attack' {ei,UyV)  A  tutth(e2,a) 

AASROC{a)  A  with(e2it)  A  torpedo{t) 

and  the  lo^cal  form  for  the  second  sentence,  rouf^y,  is  the  foUowing: 

(3  63,  yx  )go-8inker'{e3,  yx )  A  at»h(yi ) 

When  we  backward-chain  from  the  logical  forms  using  the  given  axioms,  we  end  up,  most 
of  the  time,  with  different  instances  of  the  schema  predication 

8vih8ighting~8ehema{ex ,  63, 63,  z,  y, . . .) 

as  goal  expressions.  Since  our  abductive  inference  method  merges  unifiable  goal  expres¬ 
sions,  all  of  these  are  unified,  and  this  sin^e  instance  is  assumed.  Since  it  is  almost  the 
only  expression  that  had  to  be  assumed,  we  have  a  very  economical  interpretation  for  the 
entire  text. 

To  summarize,  when  a  large  chunk  of  organized  knowledge  comes  to  be  known,  it  can 
be  encoded  in  a  single  axiom  whose  antecedent  is  a  "schema  predicate”  applied  to  all  of 
the  role  fillers  in  the  schema.  When  a  text  describes  a  situation  containing  many  of  the 
entities  and  properties  that  occur  in  the  consequent  of  the  schema  axiom,  then  very  often 
the  most  economical  interpretation  of  the  text  wiU  be  achieved  by  assuming  the  schema 
predicate,  appropriately  instantiated.  If  we  were  to  break  up  the  schema  axiom  into  a 
number  of  axioms,  each  expressing  different  stereotypical  features  of  the  situation  and 
each  having  in  its  antecedent  the  conjunction  of  a  schema  predication  and  an  et  cetera 
predication,  default  values  for  role  fillers  could  be  inferred  where  and  only  where  thqy  were 
appropriate  and  consistent. 

When  we  do  schema  recognition  in  this  way,  there  is  no  problmn,  as  there  is  in  other 
approaches,  with  merging  several  schemas.  It  is  just  a  matter  of  assuming  more  than  one 
schema  predication  with  the  right  instantiations  of  the  variables. 
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6  A  Thorough  Integration  of  Syntax,  Semantics,  and 
Pragmatics 

6.1  The  Integration 

By  combining  the  idea  of  interpretation  as  abduction  with  the  older  idea  of  parsing  as 
deduction  (Kowalski,  1980,  pp.  52-53;  Perdra  and  Warren,  1983),  it  becomes  possible  to 
int^ate  syntax,  semantics,  and  pragmatics  in  a  very  thorough  and  elegant  way.^^ 

We  will  present  this  in  terms  of  example  (2),  repeated  here  for-  convenience. 

(2)  The  Boston  office  called. 

Recall  that  to  interpret  this  we  must  prove  the  expression 

(3a)  (3x,p,2r,e)ca//'(e,*)  A  person(x)  A  rel(x,y) 

(3b)  Aoffice(y)  A  Bostoti{z)  A  nn(z,y) 

Consider  now  a  simple  grammar,  adequate  for  parsing  this  sentence,  written  in  Prolog 
style: 

(Vtoj,t02)np(toi)  A  verb{w2)  D  s(t»i  to^) 

(Vtoi,iD3)det(<he)  A  noun(toi)  A  noun(w2)  D  np(the  wi  W2) 

That  is,  if  string  wi  is  a  noun  phrase  and  string  tnj  is  a  verb,  then  the  concatenation  wi 
V2  is  a  sentence.  The  second  rule  is  interpreted  similarly.  To  parse  a  sentence  is  to 
prove  s(lV). 

We  can  integrate  syntax,  semantics,  and  local  pragmatics  by  augmenting  the  axioms 
of  this  grammar  with  portions  of  the  lopcal  form  in  the  appropriate  places,  as  follows: 

(12)  (VtPi,tD3,y,p,e,*)np(i»i,y)  A  verb{vf2,p)  A  p'(e,*)  A  rel{x,y)  A  Req(p,x) 

D  »{wi  tP2,e) 

(13)  (Vwi,iD3,q,r,y,r)det(tAe)  A  nottn(u;i,r)  A  noun{iD2,q) 

Ar(z)  A  q{y)  A  nn(r,y)  D  np{the  toj  ii>2»y) 

The  second  arguments  of  the  ‘‘lexical”  predicates  noun  and  verb  denote  the  predicates 
corresponding  to  the  words,  such  as  Boston,  office  or  call.  The  atomic  formula  np( vi,y) 
means  that  the  string  wi  is  a  noun  phrase  referring  to  y.  The  atomic  formula  Req(p,x) 
stands  for  the  requirements  that  the  predicate  p  places  on  its  argument  x.  The  spedhc 
constraint  can  then  be  enforced  if  there  is  an  aximn 

x)person{x)  D  Req{eall,x) 

**Tlut  ides  is  dse  to  Stnstt  Shieba. 
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that  says  that  one  way  for  the  requirements  to  be  satisfied  is  for  z  to  be  a  person.  Axiom 
(12)  can  then  be  paraphrased  as  follows:  “If  is  a  noun  phrase  referring  to  y,  and  W2 
is  a  verb  denoting  the  predicate  p,  and  ff  is  true  of  some  eventuality  e  and  some  entity 
z,  and  z  is  related  to  (or  coercible  from)  y,  and  z  satisfies  the  requirements  p'  places  on 
its  second  argument,  then  the  concatenation  Wj  W3  is  a  sentence  describing  eventuality 
e.”  Axiom  (13)  can  be  paraphrased  as  follows:  “If  the  is  a  determiner,  and  wi  is  a  noun 
denoting  the  predicate  r,  and  W3  is  a  noun  denoting  the  predicate  g,  and  the  predicate  r 
is  true  of  some  entity  z,  and  the  predicate  q  is  true  of  some  entity  y,  and  there  is  some 
implicit  relation  nn  between  z  and  y,  then  the  concatenation  the  toi  tD2  is  a  noun  phrase 
referring  to  the  entity  y.”  Note  that  the  conjuncts  from  line  (3a)  in  the  logical  form  have 
been  incorporated  into  axiom  (12)  and  the  conjuncts  from  line  (3b)  into  axiom  (13).*^ 
The  parse  and  interpretation  of  sentence  (2)  is  illustrated  in  Figure  10. 


s(“The  Boston  office  called.”, e) 


Figure  10:  Parse  and  interpretation  of  “The  Boston  office  called.” 

Before  when  we  proved  a(H^),  we  proved  that  W  was  a  sentence.  Now,  if  we  prove 
(3  e)8{W,  e),  we  prove  that  IV  is  an  interpretobk  sentence  and  that  the  eventuality  e  is  its 
interpretation. 

Each  axiom  in  the  “grammar”  then  has  a  “syntactic”  part — ^the  conjuncts  like  np(w\ ,  y) 
and  oerb(i03,p) — that  specifies  the  syntactic  structure,  and  a  “pragmatic”  part — ^the  con- 
juncts  like  p'(e,x)  and  re/(z,  y)— that  drives  the  interpretation.  That  is,  local  pragmatics 
is  c^tured  by  virtue  of  the  fact  that  in  order  to  prove  (3e)s(IV,e),  one  must  derive  the 
logical  form  of  the  sentence  together  with  the  constraints  predicates  impose  on  thrir  ar¬ 
guments,  allowing  for  metonymy.  The  compositicmal  semantics  of  the  sentence  is  specified 
by  the  way  the  denotations  given  in  the  qmtactic  part  are  used  in  the  construction  of  the 
pragmatic  part. 

(ivea,  tiMse  uioBW  are  seoNid-oidet,  bat  aot  aerioesly  ao,  aaoe  the  inedkate  vaiiabiee  oaly  aeed 
to  be  iastaatiated  to  predicate  ooestaats,  aem  to  laaibda  etpiraaioee.  It  is  thaa  eaay  to  coavert  them  to 
fiiat-mdcr  axioBU  by  kaeiaf  aa  iadividaal  ooaataat  coneapoartiag  to  every  predicate  coaataat. 
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One  final  modification  is  necessary,  since  the  elements  of  the  pragmatic  part  have  to 
be  assumable.  If  we  wish  to  get  the  same  costs  on  the  conjuncts  in  the  logical  form  that 
we  proposed  at  the  end  of  Section  3,  we  need  to  augment  our  formalism  to  allow  attaching 
assumability  costs  directly  to  some  of  the  conjuncts  in  the  antecedents  of  Horn  clauses. 
Continuing  to  use  the  arbitrary  costs  we  have  used  before,  we  would  thus  rewrite  the 
axioms  as  follows: 

(14)  (Vt»i,ti;2,y,p,e,*)np(t£»i,y)  A  verb{w2,p)  A  p'(e,*)*®  A  rel(i,y)*“ 

A  Req{p,x)*^^  D  s(ioi  tea,*) 

(15)  {y  ,q,r,y^2)det{the)  A  not(n(wi,r)  A  fM>un(ti;2,9) 

Ar(7)**  A  ^(y)**®  A  nn(z,y)***  3  np(the  toi  ti>2,y) 


The  first  axiom  now  says  what  it  did  before,  bat  in  addition  we  can  assume  p'(e,2)  for  a 
cost  of  $3,  rel(x,  y)  for  a  cost  of  $20,  and  Req(p,x)  for  a  cost  of  $10.*® 

Implementations  of  different  orders  of  interpretation,  or  different  sorts  of  interaction 
among  syntax,  compositional  semantics,  and  local  pragmatics,  can  then  be  seen  as  diffoent 
orders  of  search  for  a  proof  of  (3e)s(If,e).  In  a  syntax-first  order  of  interpretation,  one 
would  try  first  to  prove  all  the  “syntactic'*  atomic  formulas,  such  as  np(iDi,  y),  before  any  of 
the  “local  pragmatics**  atomic  formulas,  such  as  p'(e,a).  Verb-driven  interpretation  would 
first  try  to  prove  verb{w2,p)  and  would  then  use  the  information  in  the  requirements 
associated  with  the  verb  to  drive  the  search  for  the  arguments  of  the  verb,  by  deriving 
Req{p,x)  before  back-chaining  on  np( i0i,y).  But  more  fluid  orders  of  interpretation  are 
obviously  possible.  This  formulation  allows  one  to  prove  those  things  first  which  are 
easiest  to  prove,  and  therefore  allows  one  to  exploit  the  fact  that  the  strongest  clues  to 
the  meaning  of  a  sentence  can  come  from  a  variety  of  sources — ^its  syntax,  the  semantics 
of  its  main  verb,  the  reference  of  its  noun  phrases,  and  so  on.  It  is  also  easy  to  see  how 
processing  could  occur  in  parallel,  insofar  as  parallel  Prdog  is  possible. 

In  principle,  at  least,  this  approach  to  linguistic  structure  can  be  carried  to  finer-grained 
levels.  The  input  to  the  interpretation  process  could  be  speech  information.  Josephson 
(1990b)  and  Fox  and  Josephs^  (1991),  amoi^;  others,  are  exploring  this  idea.  The  ap¬ 
proach  can  also  be  applied  on  a  larger  scale  to  discourse  structure.  This  is  explored 
below  in  Section  6.3.  But  first  we  see  how  the  approach  can  be  applied  to  the  problem  of 
syntactically  ill-formed  utterances. 

6.2  Syntactically  Hi-Formed  Utterances 

It  is  straightforward  to  extend  this  approach  to  deal  with  ill-formed  or  unclear  utterances, 
by  first  giving  the  expression  to  be  proved  (3e)s(IV,e)  an  assumability  cost  and  then 
adding  weights  to  the  syntactic  part  of  the  axioms.  Thus,  axiom  (14)  can  be  revised  as 
follows: 

’'The  costs,  isSha  thsa  vrcigkts,  cm  the  ooaivacts  ia  the  satecedcBts  sie  akesdy  permitted  if  we  sUow, 
ss  Stickd  (1989)  does,  srbitrary  foactioBS  rather  thaa  maltiplicatife  weights. 
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(V  Wi ,  t«2,  y,  p,  e,  x)np{wu  y)*  A  verbiwj,  p)  A  p'(e,  A  re/(i,  y)*“  A  iie?(p,  * 

D  3(101  twa.e) 

This  says  that  if  you  find  a  verb,  then  for  a  small  cost  you  can  go  ahead  and  assume 
there  is  a  noun  phrase,  allowing  us  to  interpret  utterances  without  subjects,  which  are 
very  common  in  certain  kinds  of  informal  discourse,  including  equipment  failure  reports 
and  naval  operation  reports.  In  this  case,  the  variable  y  will  have  no  identifying  properties 
other  than  what  the  verb  phrase  gives  it. 

More  radically,  we  can  revise  the  axiom  to 

(V  toi ,  t02,  y,  p,  e,  x)np(wi,  y)**  A  verb(w2,p)-*  A  ^(e,  x)*®  A  re/(x,  y)*“  A  Reg(p,  x)**° 
D  s(wi  t02,e) 

This  allows  ns  to  assume  there  is  a  verb  as  well,  although  for  a  higher  cost  than  for 
assuming  a  noun  phrase  (since  presumably  a  verb  phrase  provides  more  evidmce  for  the 
existence  of  a  sentence  than  a  noun  phrase  does).  That  is,  either  the  noun  phrase  or 
the  verb  can  constitute  a  sentence  if  the  string  of  words  is  otherwise  interpretable.  In 
particular,  this  allows  us  to  handle  cases  of  ellipsis,  where  the  subject  is  given  but  the 
verb  is  understood.  In  these  cases  we  will  not  be  able  to  prove  Req(p,x)  unless  we,  first 
identify  p  by  proving  p'(e,  x).  The  solution  to  this  problem  is  likely  to  come  from  salience 
in  context  or  from  considerations  of  discourse  coherence,  such  as  recognizing  a  parallel 
with  a  previous  segment  of  the  discourse. 

Similarly,  axiom  (15)  can  be  rewritten  to  allow  omission  of  determiners,  as  is  also  very 
common  in  some  kinds  of  informal  discourse. 

6.3  Recognizing  the  Coherence  Structure  of  Discourse 

In  Hobbs  (1985d)  a  theory  of  discourse  structure  is  outlined  in  which  coherence  relations 
such  as  Parallel,  Elaboration,  and  Explanation  can  hold  between  successive  segments  of 
a  discourse  and  when  they  hold,  the  two  segments  compose  into  a  larger  segment,  giving 
the  discourse  as  a  whole  a  hierarchical  structure.  The  coherence  relations  can  be  defined 
in  terms  of  the  information  conveyed  by  the  segments. 

Insofar  as  the  coherence  relations  can  be  defined  precisely,  it  is  relativdy  straight¬ 
forward  to  incorporate  the  theory  into  our  method  of  interpretation  as  abduction.  The 
hierarchical  structure  can  be  entered  by  the  axiom 

(V  to,  6)3(10,  e)  D  Segment(tp,e) 

specifying  that  a  sentence  is  a  discourse  segment,  and  the  axiom 

(V 101 , 102,  Cl,  62,  e)Segment(wi,  61 )  A  Segtnent(w2, 62)  A  CoherenceRd(ei ,  62, 6) 

D  Segmtni{v)\  102.  e) 

saying  that  if  loi  is  a  segment  whose  assertion  or  topic  is  ei,  and  102  is  a  segment  asserting 
62,  and  a  coherence  relation  holds  between  the  content  of  loi  and  the  content  of  102,  then 
101  102  is  also  a  segment.  The  third  argument  e  of  CoherenceRel  is  the  assertion  or  topic 
of  the  composed  segment,  as  determined  by  the  definition  of  the  particular  coherence 
relation. 

To  interpret  a  text  H^,  <me  must  then  prove  the  expression 
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(3  e)Segmeni{W,  e) 

For  example.  Explanation  is  a  coherence  relation. 

(Vei,e2)£xp/anatton(ei,e3)  D  CoherenceRel{ei, e2,ei) 

A  first  approximation  to  a  definition  for  Explanation  would  be  the  following: 

(Vei,e3)cattse(e2,ei)  3  Exjdanation{ei,e2) 

That  is,  if  what  is  asserted  by  the  second  segment  could  cause  what  is  asserted  by  the  first 
segment,  then  there  is  an  explanation  relation  between  the  segments.  In  explanations, 
what  is  explained  is  the  dominant  segment,  so  the  assertion  of  the  composed  s^ment 
is  simply  the  assertion  of  the  first  segment.  (In  fact,  this  is  what  “dominant  segment” 
means.)  Hence,  the  third  argument  of  CohereneeRel  above  is  ej. 

Consider  a  variation  on  the  classic  example  from  Winograd  (1972): 

The  police  prohibited  the  women  frt>m  demonstrating. 

They  feared  violence. 

To  interpret  the  text  is  to  prove  abductively  the  expression 
5epment( “The  police  . . .  violence.”,  e) 

This  involves  proving  that  each  sentence  is  a  segment,  by  proving  they  are  sentences,  and 
proving  there  is  a  coherence  relation  between  them.  To  prove  they  are  sentences,  we  would 
tap  into  an  expanded  version  of  the  sentence  grammar  of  Section  6.1.  This  would  require 
us  to  prove  abductively  the  lopcal  form  of  the  smtences. 

One  way  to  prove  there  is  a  coherence  relation  between  the  sentences  is  to  prove  there 
is  an  Explanation  relation  between  them,  and  one  way  to  prove  that  is  to  prove  a  causal 
relation  between  their  assertions. 

After  back-chaining  in  this  manner,  we  are  faced  with  proving  the  expression 

(3ei,p,d,ti7,e2>ytt',r)prohthtt'(ei,p,d)  A  demonstrate {d,w)  ,\  ca«se(e2,ei) 
A/eor'(e2,y,o)  A  violen1f{VjZ) 

That  is,  there  is  a  prohibiting  event  ei  by  the  police  p  of  a  demonstrating  event  d  by  the 
women  w.  There  is  a  fearing  event  ea  by  someone  y  (“they”)  of  vitdence  v  by  someone  z. 
The  fearing  event  ea  causes  the  prohibiting  event  ej.  This  expression  is  just  the  logical 
forms  of  the  two  sentences,  plus  the  hypothesized  cansal  relation  between  them. 

Suppose,  plausibly  enough,  we  have  the  fdlowing  axioms: 

(Ve2»y>»)/efl*^(<2»yiO)  D  (Sd2)disv}anf{d2jl/yV)  A  ea«se(e2,d2) 

That  is,  if  e2  is  a  fearing  by  y  of  v,  then  that  will  cause  the  state  d2  of  y  not  wanting  or 
“diswanting”  v. 

(y d,w)denumstTate{d,u>)  D  {3v,z)eause{d,v)  A  viotent'{v,z) 

That  is,  demonstrations  cause  violence. 
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(V  d,  t>,  d2,  y)eause{d,  v)  A  diswant'{d2t », »)  D  (3  rfi  )di8want*(di ,y,d)A  cause{d2,  di ) 

That  is,  if  someone  p  diswants  v  and  v  is  caused  by  d,  then  that  will  cause  p  to  diswant  d 
as  well.  If  you  don’t  want  the  effect,  you  don’t  want  the  cause. 

(V  di ,  p,  d)distvaTif(di  ,p,d)  A  authorUy(p)  D  (3  cj )pTohibit\e\ ,  p,  d)  A  cause(di,  e\ ) 

That  is,  if  those  in  authority  diswant  something,  that  will  cause  them  to  prohibit  it. 

(Vci,C3,e3)cause(ei,e2)  A  eause( 63,63)  D  601166(61,63) 

That  is,  caus6  is  transitive. 

(Vp)po/tce(p)  D  authority{p) 

That  is,  the  police  are  in  authority. 

F^m  these  axioms,  we  can  prove  all  of  the  above  logical  form  except  the  propositions 
poliee{p),  demonstrate\d^w),  and  feaT'(f,ytV),  which  we  assume.  This  is  illustrated  in 
Figure  11.  Noti' ^  t  uat  in  the  course  of  doing  the  proof,  we  unify  y  with  p,  thus  resolving 
the  problematic  pronoun  reference  that  orifpnally  motivated  this  example.  “They”  refers 
to  the  police. 

One  can  imagine  a  number  of  variations  on  this  example.  V  we  had  not  included  the 
axiom  that  demonstrations  cause  violence,  we  would  have  had  to  assume  the  violence 
and  the  causal  relation  between  demonstrations  and  violence.  Moreover,  other  coherence 
relations  might  be  imagined  here  by  constructing  the  surrounding  context  in  the  right 
way.  It  could  be  followed  by  the  sentence  “But  since  they  had  never  demonstrated  before, 
they  did  not  know  that  violence  might  result.”  In  this  case,  the  second  sentence  would 
play  a  subordinate  role  to  the  third,  forcing  the  resolution  of  “they”  to  the  women.  Each 
example,  of  course,  has  to  be  analyzed  on  its  own,  and  changing  the  example  changes  the 
analysis.  In  Winograd’s  original  version  of  this  example. 

The  police  prohibited  the  women  from  demonstrating,  because  they  feared 
violence. 

the  causality  was  explicit,  thus  eUminating  the  coherence  relation  as  a  source  of  ambiguity. 
The  literal  601166(63,61)  would  be  part  of  the  lo^cal  form. 

Consider  another  coherence  relation.  A  first  approximation  to  the  Elaboration  relation 
is  that  the  same  proposition  can  be  inferred  from  the  assertions  of  each  of  the  segments. 
At  some  level,  both  segments  say  the  same  thing.  In  our  notation,  this  can  be  captured 
by  the  relation  gen. 

(Vei,e3,e)E/aboratton(6i,63,e)  D  Coher6nceAe/(6i,e3,e) 

(Vei,e3,6)pen(ei,e)  A  pen(63,6)  D  Etaboration{ei,e2,e) 

That  is,  if  there  is  an  eventuality  e  that  is  “genwated”  by  each  of  the  eventualities  ei  and 
63,  then  there  is  an  Elaboration  coherence  relation  between  ei  and  63,  and  the  assertion 
of  the  composed  segment  will  be  e. 

Let  us  consider  a  simple  example: 
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Segment{*^Th.e  police  . . ,  violence.”,  ci) 


Figure  11:  Interpretation  of  "The  police  prohibited  the  women  from  demonstrating.  They 
feared  violence.” 
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Go  down  First  Sreet.  FoUow  First  Street  to  A  Street. 

Note  that  it  is  important  to  recognize  that  this  is  sin  Elaboration,  rather  than  two  tem¬ 
porally  successive  instructions. 

To  interpret  the  t^xt  we  must  prove  abductively  the  expression 
Segment{‘*Go  ...  A  Street.",  e) 

To  prove  the  text  is  a  segment,  we  need  to  prove  each  sentence  is  a  segment,  by  proving  it  is 
a  sentence.  This  taps  us  into  an  expanded  version  of  the  sentence  grammar  of  Section  6.1, 
which  requires  us  to  prove  the  lo^cal  form  of  the  sentences.  We  also  need  to  prove  there 
is  a  coherence  relation  between  the  two  sentences.  Thus,  we  need  to  prove  (simplifying 
somewhat), 

(3  g,u,x,  y,  /,  f\)gof{g,  «,*,»)  A  down(g,  FS)  A  CoherenceRelig,  /,  /i ) 

A  /o//otc'(/,  II,  FS,  AS) 

That  is,  there  is  a  going  p  by  u  from  z  to  y  and  the  going  is  down  First  Street  (FS).  There 
is  also  a  following  /  by  u  of  First  Street  to  A  Street  {AS).  Finally,  there  is  a  coherence 
relation  between  the  going  g  and  the  foUowing  /,  with  the  composite  assertion  /i . 
Suppose  we  have  the  following  axioms  in  our  knowledge  base: 

(V/)pen(/,/) 

That  is,  the  gen  relation  is  reflexive. 

{^ g,u,x,y,z)gaf{g,u,x,y)  A  along{g,z)  D  (3f)follov/{f,u,z,y)  A  gen{g,f) 

That  is,  if  p  is  a  going  by  ti  from  z  to  y  and  is  along  z,  then  p  generates  a  following  /  by 
ft  of  z  to  y. 

{\/  g,z)down{g,z)  D  along{g,z) 

That  is,  a  down  relation  is  one  kind  of  along  rdation. 

If  we  assume  gaf{g,  v,  z,  y)  and  down(g,  FS),  then  the  proof  of  the  logical  form  of  the 
text  is  straightforward.  It  is  illustrated  in  Figure  12. 

In  Hobbs  (1991)  there  is  an  example  of  the  recognition  of  a  Contrast  relation,  following 
essentially  the  same  lines  and  resulting  in  the  interpretation  of  a  simple  metaphor. 

This  approach  has  the  flavor  of  discourse  grammar  approaches.  What  has  always  been 
the  problem  with  discourse  grammars  is  that  their  terminal  symbols  (e.g..  Introduction) 
and  smnetimes  their  compositions  have  not  been  computable.  Because  in  our  abductive, 
inferential  approach,  we  are  able  to  reason  about  the  content  of  the  utterances  of  the 
discourse,  this  problem  no  longer  exists. 

A  second  possible  approach  to  some  aspects  of  discourse  structure  already  falls  out  of 
what  was  presented  in  the  first  part  of  this  article.  In  1979,  Hobbs  published  an  article 
entitled  "Coherence  and  Coreference",  in  which  it  was  argued  that  coreference  problems 
are  often  solved  as  a  by-product  of  recogniang  coherence.  However,  one  can  turn  this 
observation  on  its  head  and  see  the  coherence  structure  of  the  text  as  a  kind  of  higher- 
order  coreference,  in  a  manner  similar  to  the  iq>proach  of  Lockman  and  Klapholz  (1980) 
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5cjfTnent(“Go  ...  A  Street.”,  /) 


Figure  12:  Interpretation  of  ^^Go  down  First  Street.  FoUow  First  Street  to  A  Street.” 

and  Lockman  (1978).  Where  we  see  two  sentences  as  being  in  an  Elaboration  relation,  for 
example,  it  is  because  we  have  inferred  the  same  eventuality  from  the  assertions  of  the 
two  sentences.  Thus,  from  both  of  the  sentences 

John  can  open  Bill’s  safe. 

He  knows  the  combination. 

we  infer  that  there  is  some  action  that  John/he  can  do  that  will  cause  the  safe  to  be  open. 
But  we  may  also  view  the  eventuality  described  by  the  second  sentence  as  inferrable  from 
the  eventuality  described  by  the  first,  as  long  as  certain  assumptions  are  made.  IVom  this 
point  of  view,  recognizing  elaborations  looks  very  much  like  ordinary  reference  resolution, 
as  described  in  Section  3  and  5.  In  Figure  12,  if  everything  above  the  literals  s("Go  down 
First  Street.”, p)  and  s( "Follow  ...  A  Street.”,/}  is  ignored,  the  content  of  the  second 
sentence  stUl  follows  from  the  content  of  the  first. 

Causal  relations  can  be  treated  similarly.  Axioms  would  tell  us  in  a  general  wa>  what 
kinds  of  things  cause  and  me  caused  by  what.  In 

John  slipped  on  a  banana  peel, 
and  broke  his  back. 
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we  cannot  infer  the  entire  content  of  the  second  clause  from  the  first,  but  we  know  in  a 
general  way  that  slipping  tends  to  cause  falls,  and  falls  tend  to  cause  injuries.  If  we  take 
the  second  clause  to  contain  an  implicit  definite  reference  to  an  injury,  we  can  recover 
the  causal  relation  between  the  two  events,  and  the  remainder  of  the  specific  information 
about  the  injury  is  new  information  and  can  be  assumed. 

Recognizing  parallelism  is  somewhat  more  complex,  but  perhaps  it  can  be  seen  as  a 
kind  of  definite  reference  to  types. 

A  disadvantage  of  this  approach  to  discourse  coherence  is  that  it  does  not  yield  the 
large-scale  coherence  structure  of  the  discourse  that  we  are  able  to  derive  in  the  approach 
based  on  coherence  relations.  This  is  important  because  the  coherence  structure  structures 
the  context  against  which  subsequent  sentences  are  interpreted. 

The  coreference  view  of  coherence  is  in  no  way  incompatible  with  the  structural  view. 
We  can  both  recognize  the  coherence  structure  and  recognize  the  implicit  definite  references 
that  rely  on  much  the  same  knowledge. 

We  have  illustrated  an  abductive  approach  to  discourse  structure  based  on  Hobbs’s 
coherence  relations.  But  any  other  sufficiently  precise  theory  of  discourse  structure,  such 
as  Rhetorical  Structure  Theory  (Mann  and  Thompson,  1986),  can  be  treated  in  a  similar 
fashion. 

We  should  point  out  a  subtle  shift  of  perspective  we  have  just  gone  through  in  Section 
6.  In  the  first  five  sections  of  this  article,  the  problem  of  interpretation  was  viewed  as 
follows:  One  is  pven  certain  observable  facts,  namely,  the  logical  form  of  the  sentence, 
and  one  has  to  find  a  proof  that  demonstrates  why  they  are  true.  In  this  section,  we 
no  longer  set  out  to  prove  the  observable  facts.  Rather  we  set  out  to  prove  that  we  are 
viewing  a  coherent  situation,  and  it  is  built  into  the  rules  that  specify  what  situations  are 
coherent  that  an  explanation  must  be  found  for  the  observable  facts.  We  return  to  this 
point  in  Section  8.3  and  in  the  conclusion. 

6.4  Integration  versus  Modularity 

For  the  past  several  decades,  there  has  been  quite  a  bit  of  discussion  in  linguistics,  psy¬ 
cholinguistics,  and  related  fields  about  the  various  modules  involved  in  language  processing 
and  their  interactions.  A  number  of  researchers  have,  in  particular,  been  concerned  to  show 
that  there  was  a  syntactic  module  that  operated  in  some  sense  independently  of  processes 
that  accessed  general  world  knowledge.  Fodor  (1983)  has  been  .perhaps  the  most  vocal 
advocate  of  this  position.  He  argues  that  human  syntactic  processing  takes  place  in  a  spe¬ 
cial  "informationally  encapsulated”  input  module,  immune  from  top-down  influences  from 
"central  processes”  involving  background  knowledge.  This  position  has  been  contentious 
in  psycholinguistics.  Marslen- Wilson  and  Tyler  (1987),  for  example,  present  evidence  that 
if  there  is  any  information  encapsulation,  it  is  not  in  a  module  that  has  lopcal  form  as  its 
output,  but  rather  one  that  has  a  mental  model  or  some  other  form  of  discourse  represen¬ 
tation  as  its  output.  Such  output  requires  background  knowledge  in  its  construction.  At 
the  very  least,  if  linguistic  processing  is  modular,  it  is  not  immune  from  top-down  context 
dependence. 

Finally,  however,  Marslen- Wilson  and  Tyler  argue  that  the  principal  question  about 
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modularity — “What  interaction  occurs  between  modules?” — is  ill-posed.  They  suggest 
that  there  may  be  no  neat  division  of  the  linguistic  labor  into  modules,  and  that  it  therefore 
does  not  make  sense  to  talk  about  interaction  between  modules.  This  view  is  very  much 
in  accord  with  the  integrated  approach  we  have  presented  here.  Knowledge  of  syntax  is 
just  one  kind  of  knowledge  of  the  world.  All  is  pven  a  uniform  representation.  Any  rule 
used  in  discourse  interpretation  can  in  principle,  and  often  in  fact  will,  involve  predications 
about  syntactic  phenomena,  background  knowledge,  the  discourse  situation,  or  anything 
else.  In  such  an  approach,  issues  of  modularity  simply  go  away. 

In  one  extended  defense  of  modularity,  Fodor  (n.d.)  begins  by  admitting  that  the  argu¬ 
ments  against  modularity  are  powerful.  “If  you’re  a  modularity  theorist,  the  fundamental 
problem  in  psycholinguistics  is  to  talk  your  way  out  of  the  massive  effects  of  context  on 
language  comprehension”  (p.  15).  He  proceeds  with  a  valiant  attempt  to  do  just  that. 
He  begins  with  an  assumption:  “Since  a  structural  description  is  really  the  union  of  rep¬ 
resentations  of  an  utterance  in  a  variety  of  different  theoretical  vocabularies,  it’s  natural 
to  assume  that  the  internal  structure  of  the  parsers  is  correspondini^y  functionally  dif¬ 
ferentiated”  (p.  10).  But  in  our  framework,  this  assumption  is  incorrect.  Facts  about 
sjmtax  and  pragmatics  are  expressed  in  different  theoretical  vocabularies  only  in  the  sense 
that  facts  about  doors  and  airplanes  are  expressed  in  different  theoretical  vocabularies — 
different  predicates  are  used.  But  the  “internal  structure  of  the  parsers”  is  the  same.  It 
is  all  abduction. 

In  discussing  certain  sentences  in  which  readers  are  “garden-pathed”  by  applying  the 
8)mtactic  strategy  of  “minimal  attachment”,  Fodor  proposes  two  alternatives,  the  first 
interactionist  and  the  second  modular:  “Does  context  bias  by  penetrating  the  parser  and 
swpendingXlM  (putative)  preference  for  minimal  attachment?  Or  does  it  bias  by  correcting 
the  output  of  the  parser  when  minimal  attachment  yidds  implausible  analyses?”  (p.  37) 
In  our  view,  neither  of  these  is  true.  The  problem  is  to  find  the  interpretation  of  the 
utterance  that  best  satisfies  a  set  of  syntactic,  semantic,  and  pragmatic  constraints.  Thus, 
all  the  constraints  are  applied  nmultaneonsly  and  the  best  interpretation  satisfying  them 
all  is  selected. 

Moreover,  often  the  utterance  is  elliptical,  obscure,  ill-formed,  or  unclear  in  parts.  In 
these  cases,  various  interpretive  moves  are  available  to  the  hearer,  among  them  the  local 
pragmatics  moves  of  assuming  metonymy  or  metaphor,  the  lexical  move  of  aiwiiining  a 
very  low-salience  sense  of  a  word,  and  the  syntactic  move  of  inserting  a  word  to  repair  the 
syntax.  The  last  of  these  is  requred  in  a  sentence  in  a  rough  draft  that  was  drculated  of 
Fodor’s  paper: 

By  contrast,  on  the  Interactive  model,  it’s  assumed  that  the  same  processes 
have  access  to  linguistic  information  can  also  access  cognitive  background. 

(p.  57-8) 

The  best  way  to  intmpret  this  sentence  is  to  assume  that  a  “that”  should  occur  between 
“processes”  and  “have”.  There  is  no  way  of  knowing  o  prion  what  interpretive  moves  will 
yield  the  best  interpretation  for  a  pven  utterance.  This  fact  would  dictate  that  syntactic 
analysis  be  completed  even  where  purdy  pragmatic  processes  could  repair  the  utterance 
to  interpretability. 
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In  Bever’s  classic  example  (Bever,  1970), 

The  horse  raced  past  the  barn  fell. 

there  are  at  least  two  possible  interpretive  moves:  insert  an  "and”  between  "bam”  and 
"fell”,  or  assume  the  rather  low-frequency,  causative  sense  of  "race”.  People  generally 
make  the  first  of  these  moves.  However,  Fodor  himself  gives  examples,  such  as 

The  performer  sent  the  flowers  was  very  pleased. 

in  which  no  such  low-frequency  sense  needs  to  be  accessed  and  the  sentence  is  more  easily 
interpreted  as  grammatical. 

Our  approach  to  this  problem  is  in  the  spirit  of  Crain  and  Steedman  (1985),  who  argue 
that  interpretation  is  a  matter  of  minimizing  the  number  of  presuppositions  it  is  necessary 
to  assume  are  in  eflfect.  Such  assumptions  add  to  the  cost  of  the  interpretation. 

There  remains,  of  course,  the  question  of  the  optimal  order  of  search  for  a  proof 
for  any  particular  input  text.  As  pmnted  out  in  Section  6.1,  the  various  proposals  of 
modularizations  can  be  viewed  as  suggestions  for  order  of  search.  But  in  our  framework, 
there  is  no  particular  reason  to  assume  a  ripd  order  of  search.  It  allows  what  seems  to  us 
the  most  plausible  account — ^that  sometimes  syntax  drives  interpretation  and  sometimes 
pragmatics  does. 

It  should  be  pointed  out  that  if  Fodor  were  to  adopt  our  poutiou,  it  would  only  be 
with  the  utmost  pessimism.  According  to  him,  we  would  have  taken  a  peripheral,  modular 
process  that  is,  for  just  that  reason,  perhaps  amenable  to  investigation,  and  turned  it  into 
one  of  the  central  processes,  the  understanding  of  which,  on  his  view,  would  be  completdy 
intractable.  However,  it  seems  to  us  that  nothing  can  be  lost  in  this  move.  Insofar  as 
syntax  is  tractable  and  the  syntactic  processing  can  be  traced  out,  this  information  can 
be  treated  as  information  about  efficient  search  orders  in  the  central  processes. 

Finally,  the  reader  may  object  to  this  integration  because  syntax  and  the  other  so- 
called  modules  constitute  coherent  domains  of  inquiry,  and  breaking  down  the  barriers 
between  them  can  only  result  in  conceptual  confusion.  This  is  not  a  necessary  consequence, 
however.  One  can  still  distinguish,  if  one  wants,  between  linguistic  axioms  such  as  (12) 
and  background  knowledge  axioms  such  as  (8).  It  is  just  that  they  wiU  both  be  expressed 
in  the  same  formal  language  and  used  hr  the  same  foshion.  What  the  integration  has  done 
is  to  remove  such  distinctions  from  the  code  and  put  them  into  the  comments. 

7  Relation  to  Other  Work 

7.1  Previous  and  Current  Researdi  on  Abduction  in  AI 

The  term  "abduction”  was  first  used  by  C.  S.  Pierce  (e.g.,  1955),  who  also  called  the 
process  "retroductiou”.  His  definition  of  it  is  as  follows: 

The  surprising  fact,  C,  is  observed; 

But  if  A  were  true,  C  would  be  a  matter  of  course. 

Hence,  there  is  reason  to  suspect  that  A  is  true.  (p.  151) 
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Pierce’s  C  is  what  we  have  been  calling  g(A)  and  his  A  is  what  we  have  been  calling  p{A). 
To  say  “if  A  were  true,  C  would  be  a  matter  of  course”  is  to  say  that  for  all  x,  p(x)  implies 
q{x),  that  is,  (Vx)p(z)  D  9(z)-  He  goes  on  to  describe  what  he  refers  to  as  “abductory 
induction”.  In  our  terms,  this  is  when,  after  abductively  hypothesizing  p(A),  one  checks 
a  number  of,  or  a  random  selection  of,  properties  f,-  such  that  (V2)p(z)  D  9i(z),  to  see 
whether  9, (A)  holds.  This,  in  a  way,  corresponds  to  our  check  for  consistency.  Then  Pierce 
says  that  “in  pure  abduction,  it  can  never  be  justifiable  to  accept  the  hypothesis  otherwise 
than  as  an  interrogation”,  and  that  “the  whole  question  of  what  one  out  of  a  number  of 
possible  hypotheses  ought  to  be  entertained  becomes  purely  a  question  of  economy.”  This 
corresponds  to  our  evaluation  scheme. 

The  earliest  formulation  of  abduction  in  artificial  intelligence  was  by  Morgan  (1971). 
He  showed  how  a  complete  set  of  truth-preserving  rules  for  generating  theorems  could  be 
turned  into  a  complete  set  of  falsehood-preserving  rules  for  generating  hypotheses. 

The  first  application  of  abduction  in  artificial  intelligence  was  by  Pople  (1973),  in  the 
context  of  medical  diagnosis.  He  gave  the  formulation  of  abduction  that  we  have  used 
and  showed  how  it  can  be  implemented  in  a  theorem-proving  framework.  Literals  that  are 
“abandoned  by  deduction  in  the  sense  that  they  fiul  to  have  successor  nodes”  (p.  150)  axe 
taken  as  the  candidate  hypotheses.  Those  hypotheses  are  best  that  account  for  the  most 
data,  and  in  service  of  this  principle,  he  introduced  factoring  or  synthesis,  which,  just  as 
in  our  scheme,  attempts  to  unify  goal  literals.  Hypotheses  where  this  is  used  are  favored. 
No  further  scoring  criteria  are  ^ven,  however. 

Work  on  abduction  in  artificial  intelligence  was  revived  in  the  early  1980s  at  several 
sites.  Reggia  and  his  colleagues  (e.g.,  Reggia  et  al.,  1983;  Reggia,  1985)  formulated  ab- 
ductive  inference  in  terms  of  parsimonious  covering  theory.  One  is  pven  a  set  of  disorders 
(our  p(A)’s)  and  a  set  of  manifestations  (our  9(A)’8)  and  a  set  of  causal  relations  between 
disorders  and  manifestations  (our  rules  of  the  form  (Vz)p(z)  D  9(z)).  An  explanation 
for  any  set  of  manifestations  is  a  set  of  disorders  which  together  can  cause  all  of  the  man¬ 
ifestations.  The  minimal  explanation  is  the  best  one,  where  minimality  can  be  defined 
in  terms  of  cardinality  or  inedundancy.  More  recently,  Peng  and  R^pa  (1987a,  1987b) 
have  b^un  to  incorporate  probabilistic  considerations  into  their  notion  of  minimality.  For 
R^gia,  the  sets  of  disorders  and  manifestations  are  distinct,  as  is  appropriate  for  medical 
diagnosis,  and  there  is  no  backward-chaining  to  deeper  causes;  our  abduction  method  is 
more  general  than  his  in  that  we  can  assume  any  proposition — one  of  the  manifestations 
or  an  underlying  cause  of  arbitrary  depth. 

In  their  textbook,  Chamiak  and  McDermott  (1985)  presented  the  banc  pattern  of 
abduction  and  then  discuss  many  of  the  issues  involved  in  trying  to  decide  among  alter¬ 
native  hypotheses  on  probabilistic  grounds.  Reasoning  in  uncertainty  and  its  application 
to  expert  systems  are  presented  as  examples  of  abduction. 

Cox  and  Pietrzykowski  (1986)  present  a  formulation  in  a  theorem-proving  framework 
that  is  very  similar  to  Pople’s,  though  apparently  independent.  It  is  especially  valuable 
in  that  it  considers  abduction  abstractly,  as  a  mechanism  with  a  variety  of  possible  ap¬ 
plications,  and  not  just  as  a  handmaiden  to  diagnosis.  The  test  used  to  sdect  a  suitable 
hypothesis  is  that  it  should  be  what  they  call  a  “dead  end”;  that  is,  it  should  not  be  pos¬ 
sible  to  find  a  stronger  consistent  assumption  by  backward-chaining  from  the  hypothecs 
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using  the  axioms  in  the  knowledge  base.  The  dead-end  test  forces  the  abductive  reasoning 
system  to  overcommit — to  produce  overly  specific  hypotheses.  This  is  a  problem,  however, 
since  it  often  does  not  seem  reasonable  to  accept  any  of  a  set  of  very  specific  assumptions 
as  the  explanation  of  the  fact  that  generated  them  by  backward-chaining  in  the  knowl¬ 
edge  base.  More  backward-chaining  is  not  necessarily  better.  Moreover,  the  location  of 
these  dead  ends  is  often  a  rather  superficial  and  incidental  feature  of  the  knowledge  base 
that  has  been  constructed.  It  is  in  part  to  overcome  such  objections  that  we  devised  our 
weighted  abduction  scheme. 

In  recent  years  there  has  been  an  explosion  of  interest  in  abduction  in  artificial  intelli¬ 
gence.  Some  recent  formal  approaches  are  those  of  Reiter  and  de  Kleer  (1987),  Levesque 
(1989),  and  Poole  (1991).  A  good  overview  of  recent  research  on  abduction  can  be  obtained 
from  0*Rx>rke  (1990). 

In  many  of  the  applications  of  abduction  to  diagnosis,  it  is  assumed  that  the  relations 
expressed  by  the  rules  are  all  causal,  and  in  fact  Josephson  (1990a)  has  argued  that  that 
is  necessarily  the  case  in  explanation.  It  seems  to  ns  that  when  one  is  diagnosing  physical 
devices,  of  course  explanations  must  be  in  terms  of  physical  causality.  But  when  we 
are  working  within  an  informational  system,  such  as  language  or  mathematics,  then  the 
relations  are  implications!  and  not  necessarily  causal. 

7.2  Inference  in  Natural  Language  Understanding 

The  problem  of  using  world  knowledge  in  the  interpretation  of  discourse,  and  in  particular 
of  drawing  the  appropriate  inferences,  has  been  investigated  by  a  number  of  researchers  for 
the  last  two  decades.  Among  the  earliest  work  was  that  of  Ri^er  (Ri^er,  1974;  Schank, 
1975).  He  and  his  coUeagnes  implemented  a  system  in  which  a  sentence  was  miq>ped  into 
an  underlying  representation  on  the  basis  of  semantic  information,  and  then  all  of  the 
possible  inferences  that  could  be  drawn  were  drawn.  Where  an  ambiguity  was  present, 
those  interpretations  were  best  that  yielded  the  most  inferences.  Riser’s  work  was  seminal 
in  that  of  those  who  appreciated  the  importance  of  world  knowledge  in  text  interpretation, 
his  implementation  was  probably  the  most  goieral  and  on  the  largest  scale.  But  because 
he  imposed  no  constraints  on  what  inferences  should  be  drawn,  his  method  was  inherently 
combinatorially  explosive. 

Recent  work  by  Sperber  and  Wilson  (1986)  takes  an  approach  very  similar  to  Rieger’s. 
They  present  a  noncomputational  attempt  to  characterize  the  relevance  of  utterances 
in  discourse.  They  first  define  a  contextual  implication  of  some  new  information,  say, 
that  provided  by  a  new  utterance,  to  be  a  condusion  that  can  be  drawn  from  the  new 
information  plus  currently  highlighted  background  knowledge  but  that  cannot  be  drawn 
from  either  alone.  An  utterance  is  then  rdevant  to  the  extent,  essentially,  that  it  has  a 
large  number  of  easily  derived  contextual  implications.  To  extend  this  to  the  problem  of 
interpretation,  we  codd  say  that  the  best  interpretation  of  an  ambiguous  utterance  is  the 
one  that  gives  it  the  greatest  relevance  in  the  context. 

In  the  late  1970s  and  early  19808,  Roger  Schank  and  his  students  scaled  back  from  the 
ambitions  program  of  Rieger.  They  adopted  a  method  for  handling  extended  text  that 
combined  keywords  and  scripts.  The  text  vras  scanned  for  particular  keywords  which  were 
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used  to  select  the  pre-stored  script  that  was  most  likely  to  be  relevant.  The  script  was 
then  used  to  guide  the  rest  of  the  processing.  This  technique  was  used  in  the  FRUMP 
program  (DeJong,  1977;  Schank  et  al.,  1980)  for  summarizing  stories  on  the  Associated 
Press  news  wire  that  dealt  with  terrorist  incidents  and  with  disasters.  Unconstrained 
inference  was  thereby  avoided,  but  at  a  cost.  The  technique  was  necessarily  limited  to 
very  narrow  domains  in  which  the  texts  to  be  processed  described  stereotyped  scenarios 
and  in  which  the  information  was  conveyed  in  stereotyped  ways.  The  more  one  examines 
even  the  seemin^y  simplest  examples  of  spoken  or  written  discourse,  the  more  one  realizes 
that  very  few  cases  satisfy  these  criteria. 

In  what  can  be  viewed  as  an  alternative  response  to  Rieger’s  project,  Hobbs  (1980) 
proposed  a  set  of  constraints  on  the  inferences  that  should  be  drawn  in  knowledge-based 
text  processing:  those  inferences  should  be  drawn  that  are  required  for  the  most  economical 
solution  to  the  discourse  problems  posed  by  the  text.  These  problems  include  interpreting 
vague  predicates,  resolving  definite  references,  discovering  the  congruence  of  predicates 
and  their  arguments,  discovering  the  coherence  relations  among  adjacast  segments  of  text, 
and  detecting  the  relation  of  the  utterances  to  the  speaker’s  or  writer’s  overall  plan.  For 
each  problem  a  discourse  operation  was  defined,  characterizing  the  forward  and  backward 
inferences  that  had  to  be  <kawn  for  that  problem  to  be  solved. 

The  difference  in  approaches  can  be  characterized  bri^y  as  follows:  The  Rieger  and  the 
Sperber  and  Wilson  models  assume  the  unrestricted  drawing  of  forward  inferences,  and  the 
best  interpretation  of  a  text  is  the  one  that  maximizes  this  set  of  inferences.  The  selective 
inferencing  model  posits  certain  external  constraints  on  what  counts  as  an  interpretation, 
namely,  that  certain  discourse  problems  must  be  solved,  and  the  best  interpretation  is  the 
the  set  of  inferences,  some  badwaxd  and  some  forward,  that  satisfies  these  constraints 
most  economically.  In  the  abductive  modd,  there  is  only  one  constraint,  namely,  that 
the  text  must  be  explained,  and  the  best  interpretation  is  the  set  of  backward  inferences 
that  does  this  most  economically.  Whereas  Rieger  and  Sperber  and  Wilson  were  forward- 
chaining  from  the  text  and  trying  to  maximize  implications,  we  are  backward-chaining 
from  the  text  and  trying  to  minimize  assumptions. 

7.3  Abduction  in  Natural  Language  Understanding 

Grice  (1975)  introduced  the  notion  of  **conver8ational  implicature”  to  handle  examples 
like  the  following: 

A:  How  is  John  doing  on  his  new  job  at  the  bank? 

B:  Quite  well.  He  likes  his  colleagues  and  he  hasn’t  embezzled  any  money  yet. 

Grice  argues  that  in  order  to  see  this  as  coherent,  we  must  assume,  or  draw  as  a  conver¬ 
sational  implicature,  that  both  A  and  B  know  that  John  is  dishonest.  An  implicature  can 
be  viewed  as  an  abductive  move  for  the  sake  of  achieving  the  best  interpretation. 

Lewis  (1979)  introduces  the  notion  of  ’‘accommodation”  in  conversation  to  explain  the 
phenomenon  that  occurs  when  you  “say  something  that  requires  a  misung  presuppori- 
tion,  and  straightaway  that  presupposition  springs  into  existence,  making  what  you  said 
acceptable  after  all.”  The  hearer  accommodates  the  speaker. 
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Thomason  (1985)  argued  that  Grice’s  conversational  implicatures  are  based  on  Lewis’s 
rule  of  accommodation.  We  might  say  that  implicature  is  a  procedural  characterization  of 
something  that,  at  the  functional  or  interactional  level,  appears  as  accommodation.  When 
we  do  accommodation,  implicature  is  what  our  brain  does. 

Hobbs  (1979)  recognized  that  many  cases  of  pronoun  reference  resolution  were  in  fact 
conversational  implicatures,  drawn  in  the  service  of  achieving  the  most  coherent  interpreta¬ 
tion  of  a  text.  Hobbs  (1983a)  gave  an  account  of  the  interpretation  of  a  spatial  metaphor 
as  a  process  of  backward-chaining  from  the  content  of  the  utterance  to  a  more  s^>ecific 
underlying  proposition,  although  the  details  are  vague.  Hobbs  (1982b)  showed  how  the 
notion  of  implicature  can  sdve  many  problematic  cases  of  definite  reference.  However,  in 
none  of  this  work  was  there  a  recognition  of  the  pervasive  role  of  abdnctive  explanation 
in  discourse  interpretation. 

A  more  thorough-going  early  use  of  abduction  in  natural  language  understanding  was 
in  the  work  of  Norvig  (1983,  1987),  Wilensky  (1983;  Wilensky  et  al.,  1988),  and  their 
associates.  They  propose  an  operation  of  ‘^concretion”,  one  of  many  that  take  place  in  the 
processing  of  a  text.  It  is  a  “kind  of  infmence  in  which  a  more  specific  interpretation  of 
an  utterance  is  made  than  can  be  sustained  on  a  strictly  logical  basis”  (Wilensky  et  al., 
1988,  p.  50).  Thus,  “to  use  a  pencil”  generally  means  to  write  with  a  pencil,  even  though 
one  could  use  a  pencil  for  many  other  purposes.  The  operation  of  concretion  works  as 
follows:  “A  concept  represented  as  an  instance  of  a  eatery  is  passed  to  the  concretion 
mechanism.  Its  eligibility  for  membership  in  a  more  specific  subcategory  is  determined  by 
its  ability  to  meet  the  constraints  imposed  on  the  subcategory  by  its  associated  relations 
and  aspectual  constraints.  If  all  applicable  conditions  are  met,  the  concept  becomes  an 
instance  of  the  subcategory”  (ibid.).  In  the  termindogy  of  our  schema. 

From  q(A)  and  (Vx)p(x)  D  q(x),  conclude  p(A), 

A  is  the  concept,  g  is  the  higher  category,  and  p  is  the  more  specific  subcategory.  Whereas 
Wilensky  et  al.  view  concretion  as  a  special  and  somewhat  questionable  inference  from 
9(A),  in  the  abdnctive  approach  it  is  a  matter  of  determining  the  best  explanation  for  q(A). 
The  “associated  relations  and  aspectual  constraints”  axe  other  consequences  of  p(A).  Id 
part,  checking  these  is  checking  for  the  consistency  of  p(A).  In  part,  it  is  bong  able  to 
explain  the  most  with  the  least. 

Norvig  (1987),  in  particular,  describes  this  process  in  terms  of  marker  pasung  in  a 
semantic  net  framework,  deriving  originally  from  Qnillian  (1968).  Markers  are  passed 
from  node  to  node,  losing  energy  with  each  pass,  until  they  run  out  of  energy.  When  two 
markers  collide,  the  paths  they  followed  are  inspected,  and  if  they  are  of  the  right  shiqm, 
they  constitute  the  inferences  that  are  drawn.  Semantic  nets  express  implicative  relations, 
and  thar  links  can  as  easily  be  expressed  as  axioms.  Hierarchical  rdations  correspond  to 
axioms  of  the  form 

(Vx)p(i)  D  g(x) 

and  slots  correspond  to  axioms  of  the  form 

(Vx)p(x)  D  (3p)q(v,x)  A  r(p) 
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Marker  passing  therefore  is  equivalent  to  forward-  and  backward-chaining  in  a  set  of  ax¬ 
ioms.  Although  we  do  no  forward-chuning,  the  use  of  ‘‘et  cetera”  propositions  described 
in  Section  4  accomplishes  the  same  thing.  Norvig’s  ‘barker  energy”  corresponds  to  our 
costs;  when  the  weights  on  antecedents  sum  to  greater  than  one,  that  means  cost  is  increas¬ 
ing  and  hence  marker  energy  is  decreamng.  Nordg’s  marker  coUiuon  corresponds  to  our 
factoring.  We  believe  ours  is  a  more  compelling  account  of  interpretation.  There  is  really 
no  justification  for  the  operation  of  marker  passing  beyond  the  pretheoretic  psychological 
notion  that  there  are  associations  between  cono^ts  and  one  concept  reminds  ns  of  another. 
And  there  is  no  justification  at  all  for  why  marker  collision  is  what  should  detenhine  the 
inferences  that  are  drawn  and  hence  the  interpretation  of  the  text.  In  our  formulation, 
by  contrast,  the  interpretation  of  a  text  is  the  best  explanation  of  why  it  would  be  true, 
"marker  passing”  is  the  search  through  the  axioms  in  the  knowledge  base  for  a  proof,  and 
"marker  collision”  is  the  discovery  of  redundancies  that  yield  more  economic  explanations. 

Chamiak  and  his  associates  have  also  bemi  working  out  the  details  of  an  abductive 
approach  to  interpretation  for  a  number  of  years.  Chamiak  (1986)  expresses  the  funda¬ 
mental  inught:  "A  standard  platitude  is  that  understanding  something  is  rdating  it  to 
what  one  already  knows.  . . .  One  extreme  example  would  be  to  prove  that  what  one  is 
told  must  be  true  on  the  basis  of  what  one  already  knows.  . . .  We  want  to  prove  what  one 
is  told  given  certain  assumptions.*' 

To  compare  Chamiak’s  approach  with  oun,  it  is  useful  to  examine  in  detail  one  of  his 
operations,  that  for  resolving  definite  references.  In  Chamiak  and  Goldman  (1988)  the 
rule  is  given  as  follows: 

(inst  ?z  Tlraae)  ^ 

(OR  (PEzists  (y  :  ?lra»e)(«»  ?z  ?y))-® 

(-»0R  (role^inst  ?z  ?superfzB  ?slot) 

(Exists  (?s  :  ?supsrfni) 

(—  (Tslot  ?s)  ?x))))-*) 

For  the  sake  of  concreteness,  we  will  look  at  the  example 

John  bought  a  new  car.  The  engine  is  already  acting  up. 

where  the  problem  is  to  resolve  "the  engine”.  For  the  sake  of  comparing  Chamiak  and 
(Goldman’s  with  our  approach,  let  us  suppose  we  have  the  axiom 

(16)  (Vf)cor(if)  D  (3x)eiiyine-o/(*,y)  A  engine{x) 

That  is,  if  y  is  a  car,  then  there  is  an  en^e  x  which  is  the  en^e  of  y.  The  rdevant 
portion  of  the  logical  form  of  the  second  sentence  is 

(3  ... ,z, ...)...  A  engine{x)  A  . . . 

and  after  the  first  sentence  has  been  processed,  ear(C)  is  in  the  knowledge  base. 

Now,  Chamiak  and  (joldman’s  expression  (inat  ?x  Tfraae)  says  that  an  entity  ?x, 
say,  the  engine,  is  an  instance  of  a  frame  Tfraae,  such  as  the  frame  engine.  In  our 
termincdogy,  this  is  simply  engine{x).  The  first  disjunct  in  the  conclusion  of  the  rule  says 
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that  a  j  instantiating  the  same  frame  previously  exists  (PEzists)  in  the  text  and  is  equal 
to  (or  the  best  name  for)  the  mentioned  enpne.  For  us,  that  corresponds  to  the  case 
where  we  already  know  engine{E)  for  some  E.  In  the  second  disjunct,  the  expression 
(role-inst  ?z  Tsuperfra  ?alot)  says  that  ?z  is  a  possible  filler  for  the  Tslot  slot  in 
the  frame  Tsuperfza,  as  the  engine  2  is  a  possible  filler  for  the  engine-of  slot  in  the  car 
frame.  In  our  formulation,  that  corresponds  to  backward-chaining  using  axiom  (16)  and 
finding  the  predicate  ear.  The  expression 

(Exists  (?s  :  ?superf za)  ("  (Tslot  ?s)  ?z)) 

says  that  some  entity  ?s  instantiating  the  frame  ?superf  za  must  exist,  and  its  ?slot  slot 
is  equal  to  (or  the  best  name  for)  the  definite  entity  ?z.  So  in  our  example,  we  need  to 
find  a  car  whose  existence  is  known  or  can  be  inferred.  The  operator  -»^0R  tells  us  to  infer 
its  first  argument  in  all  possible  ways  and  then  to  prove  its  second  argument  with  one  of 
the  resulting  bindings.  The  superscripts  on  the  disjuncts  are  probabilities  that  result  in 
favoring  the  first  over  the  second,  thereby  favoring  shorter  proofs.  The  two  disjuncts  of 
Chamiak  and  G<ddman’s  rule  therefore  correspond  to  the  two  cases  of  not  having  to  use 
axiom  (16)  in  the  proof  of  the  engine’s  existence  and  having  to  use  it. 

There  are  two  ways  of  viewing  the  difference  between  (Chamiak  and  (Goldman’s  for¬ 
mulation  and  ours.  The  first  is  that  whereas  they  must  explicitly  state  complex  rules 
for  definite  reference,  lexical  disambiguation,  case  disambiguation,  plan  recognition,  and 
other  discourse  operations  in  a  complex  metalanguage,  we  simply  do  backward-chaining 
on  a  set  of  axioms  expressing  our  knowledge  of  the  world.  Their  rules  can  be  viewed  as 
descriptions  of  this  backward-chaining  process;  If  you  find  r(z)  in  the  tmrt,  then  look  for 
an  r(i4)  in  the  preceding  text,  or,  if  that  fails,  look  for  an  axiom  of  the  form 

(Vy)p(y)  3  (3»)9(!t,»)  A  r(2) 

and  a  p(B)  in  the  preceding  text  or  the  knowledge  base,  and  make  the  appropriate  iden¬ 
tifications. 

Alternatively,  we  can  view  Chamiak  and  Goldman’s  rule  as  an  axiom  schema,  one  of 
whose  instances  is 

{>/ x)engine(x)  D  [(3  y)en9tne(y)  A  ys=*] 

V[(3y)cor(y)  A  engine-of  {x,y)] 

V  [(3  y)triicib(y)  A  engine-of {x,  y)] 

V[(3y)p/one(y)  A  enyine-o/(x,  y)j 

V  ... 

Kautz  (1987)  and  Konolige  (1990)  point  ont  that  abduction  can  be  viewed  as  nmimono- 
tonic  reasoning  with  dosnre  axioms  and  minimization  over  causes.  That  is,  where  there 
are  a  number  of  potential  canses  expressed  as  axioms  of  the  form  Pi  D  Q,  ere  can  write 
the  closure  axiom  Q  3  Pi  V  P3  V  . . .,  sayii^  that  if  Q  hdds,  then  one  of  the  P’s  must  be 
its  explanation.  Then  instead  of  backward-chaining  through  axioms  of  the  first  sort,  one 
forward  chains  through  axioms  of  the  second  sort.  Minimization  over  the  Pi’s,  or  assuming 
as  many  of  them  as  posrible  to  be  fdse,  then  selects  the  most  economic  conjunctions  of 
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Ft’s  for  explaining  Q.  Charniak  and  Goldman’s  approach  is  one  of  forward-chaining  and 
minimization,  whereas  our  approach  is  one  of  backward-chaining. 

In  more  recent  work,  Goldman  and  Charniak  (1990;  Charniak  and  Goldman,  1989) 
have  begun  to  implement  thdr  interpretation  procedure  in  the  form  of  an  incrementally 
built  belief  network  (Pearl,  1988),  where  the  links  between  the  nodes,  representing  influ¬ 
ences  between  events,  are  determined  from  the  axioms,  stated  as  described  above.  They 
feel  that  one  can  make  not  unreasonable  estimates  of  the  required  probabilities,  giving  a 
principled  semantics  to  the  numbers.  The  networks  are  then  evaluated  and  ambiguities 
are  resolved  by  looking  for  the  highest  resultant  probabilities. 

It  is  clear  that  minimality  in  the  number  of  assumptions  is  not,  by  itself,  adequate 
for  choosing  among  interpretations;  this  is  why  we  have  added  weights.  Ng  and  Mooney 
(1990)  have  proposed  another  criterion,  which  they  call  "explanatory  coherence”.  They 
define  a  "coherence  metric”  that  ^ves  special  wdght  to  observations  explained  by  other 
observations.  One  ought  to  be  able  to  achieve  this  by  factoring,  but  they  give  examples 
where  factoring  does  not  work.  Thdr  motivating  examples,  however,  are  generally  short, 
two-sentence  texts,  where  they  fail  to  take  into  account  that  one  of  the  facts  to  be  explained 
is  the  adjacency  of  the  sentences  in  a  single,  coherent  text.  When  one  does,  one  sen  that 
their  supposedly  simple  but  low-coherence  mcplanations  are  bad  just  because  they  explain 
so  little.  We  believe  it  remains  to  be  established  that  the  coherence  metric  achieves 
anything  that  a  minimality  metric  does  not. 

There  has  been  other  recent  work  on  using  abduction  in  the  solution  of  various  natu¬ 
ral  language  problems,  including  the  problems  of  lexical  ambiguity  (Dasigi,  1988,  1990), 
structural  ambiguity  (Nagao,  1989),  and  lexical  selection  (Zadrozny  and  Kokar,  1990). 

8  Future  Directions 

8.1  Making  Abduction  More  EflScient 

Deduction  is  explosive,  and  since  the  ad>doction  scheme  augments  deduction  with  two 
more  options  at  each  node — assumption  and  factoring — ^it  is  even  more  explosive.  We  are 
currently  engaged  in  an  empirical  investigation  of  the  behavior  of  this  abductive  scheme 
on  a  knowledge  base  of  nearly  600  axioms,  performing  relatively  sophisticated  linguistic 
processing.  So  far,  we  have  begun  to  experiment,  with  gpod  results,  with  three  different 
techniques  for  controlling  abduction — ^a  type  hierarchy,  unwinding  or  avoiding  transitivity 
axioms,  and  various  heuristics  for  redudng  the  branch  factor  of  the  search. 

We  expect  our  investigation  to  continue  to  yield  techniques  for  controlling  the  abduc¬ 
tion  process. 

The  Type  Hierarchy:  The  first  example  on  which  we  tested  the  abductive  scheme 
was  the  sentence 

There  was  adequate  lube  (fil. 

The  system  got  the  correct  interpretation,  that  the  lube  oil  was  the  lube  oil  in  the  lube  oil 
system  of  the  air  compressor,  and  it  assumed  that  that  lube  <h1  was  adequate.  But  it  also 
got  another  interpretation.  There  is  a  mention  in  the  knowledge  base  of  the  adequacy  of 
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the  lube  oil  pressure,  so  the  system  identified  that  adequacy  with  the  adequacy  mentioned 
in  the  sentence.  It  then  assumed  that  the  pressure  was  lube  oil. 

It  is  clear  what  went  wrong  here.  Pressure  is  a  magnitude  whereas  lube  oil  is  a 
material,  and  magnitudes  can’t  be  materials.  In  principle,  abduction  requires  a  check 
for  the  consistency  of  what  is  assumed,  and  our  knowledge  base  should  have  contained 
axioms  from  which  it  could  be  inferred  that  a  magnitude  is  not  a  material.  In  practice, 
unconstrained  consistency  checking  is  undeddable  and,  at  best,  may  take  a  long  time. 
Nevertheless,  one  can,  through  the  use  of  a  type  hierarchy,  eliminate  a  very  large  number 
of  possible  assumptions  that  are  likely  to  result  in  an  inconsistency.  We  have  consequently 
implemented  a  module  that  specifies  the  types  that  various  predicate-argument  positions 
can  take  on,  and  the  likely  disjointness  relations  among  types.  This  is  a  way  of  exploiting 
the  specificity  of  the  English  lexicon  for  computational  purposes.  This  addition  led  to  a 
speed-up  of  two  orders  of  magnitude. 

A  further  use  of  the  type  hierarchy  speeds  up  processing  by  a  factor  of  2  to  4.  The 
types  provide  prefUtering  of  relevant  axioms  for  compound  nominal,  coercion,  and  other 
very  general  relations.  Suppose,  for  example,  that  we  wish  to  prove  re/(a,6),  and  we  have 
the  two  axioms 

3  rel(x,y) 

P2(*,»)  3  rel{x,y) 

Without  a  type  hierarchy  we  would  have  to  backward-chain  on  both  of  these  axioms. 
If,  however,  the  first  of  the  axioms  is  valid  only  when  x  and  y  are  of  types  ii  and  t2> 
respectively,  and  the  second  is  valid  only  when  x  and  y  are  of  types  <3  and  <4,  respectively, 
and  a  and  b  have  already  been  determined  to  be  of  types  tj  and  <3,  respectively,  then  we 
need  to  backward-chain  on  only  the  first  of  the  axioms. 

There  is  a  problem  with  the  type  hierarchy,  however.  In  an  ontologically  promiscuous 
notation,  there  is  no  commitment  in  a  primed  proposition  to  truth  or  existence  in  the  real 
world.  Thus,  /ii6e-ot7'(e,  o)  does  not  say  that  o  is  lube  oil  or  even  that  it  exists;  rather 
it  says  that  e  is  the  eventuality  of  o’s  being  lube  cul.  This  eventuality  may  or  may  not 
exist  in  the  real  world.  If  it  does,  then  we  would  express  this  as  Rexists^e),  and  from 
that  we  could  derive  from  axioms  the  existence  of  o  and  the  fact  that  it  is  lube  oil.  But 
e’s  existential  status  could  be  something  different.  For  example,  e  could  be  nonexistent, 
expressed  as  not(e)  in  the  notation,  and  in  En^sh  as  "The  eventuality  e  of  o’s  bdng  lube 
oil  does  not  exist,”  or  simply  as  "o  is  not  lube  oil.”  Or  e  may  exist  only  in  someone’s 
beliefs  or  in  some  other  possible  world.  While  the  axiom 

(V x)prcssttre(*)  D  ~tiube~oil(x) 

is  certainly  true,  the  axiom 

(Vei,x)pressttre'(ei,z)  D  ~f(3e2)lii6e-oiI'(e2,x) 

would  not  be  true.  The  fact  that  a  variable  occupies  the  second  argument  porition  of  the 
predicate  /ufre-ot/'  does  not  mean  it  is  lube  oil.  We  cannot  properly  restrict  that  argument 
position  to  be  lube  <^,  or  fluid,  or  even  a  material,  for  that  would  rule  out  perfectly  true 
sentences  like  "Iriith  is  not  lube  oil.” 
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Generally,  when  one  uses  a  type  hierarchy,  one  assumes  the  types  to  be  disjoint  sets 
with  cleanly  defined  boundaries,  and  one  assumes  that  predicates  take  arguments  of  only 
certain  types.  There  are  a  lot  of  problems  with  this  idea.  In  any  case,  in  our  work,  we 
are  not  buying  into  this  notion  that  the  universe  is  typed.  Rather,  we  are  using  the  type 
hierarchy  strictly  as  a  heuristic,  as  a  set  of  guesses  not  about  what  could  or  could  not 
he  but  about  what  it  would  or  would  not  occur  to  someone  to  say.  When  two  types  are 
declared  to  be  disjoint,  we  are  saying  that  they  are  certainly  disjoint  in  the  real  world,  and 
that  they  are  very  probably  disjoint  everywhere  except  in  certain  bizarre  modal  contexts. 
This  means,  however,  that  we  risk  failing  on  certain  rare  examples.  We  could  not,  for 
example,  deal  with  the  sentence,  ‘‘It  then  assumed  that  the  pressure  was  lube  oil.” 

Unwinding  or  Avoiding  lYsmsitivity  Axioms:  At  one  point,  in  order  to  conclude 
from  the  sentence 

Bombs  exploded  at  the  offices  of  French-owned  firms  in  Catalonia. 

that  the  country  in  which  the  terrorist  incident  occurred  was  Spain,  we  wrote  the  following 
axiom: 

(Vz,y,z)»n(a:,y)  A  part‘Oj{y,z)  D  tn(i,z) 

That  is,  if  x  is  in  y  and  y  is  a  part  of  x,  then  x  is  also  in  z.  The  interpretation  of  this 
sentence  was  taking  an  extraordinarily  long  time.  When  we  examined  the  search  space,  we 
discovered  that  it  was  dominated  by  this  one  asdom.  We  replaced  the  axiom  with  several 
axioms  that  limited  the  depth  of  recursion  to  three,  and  the  problem  disappeared. 

In  general,  one  must  exercise  a  certain  discipline  in  the  axioms  one  writes.  Which 
kinds  of  axioms  cause  trouble  and  how  to  replace  them  with  adequate  but  less  dangerous 
axioms  is  a  matter  of  continmng  investigation. 

Reducing  the  Branch  Factor  of  the  Search:  It  is  always  useful  to  reduce  the 
branch  factor  of  the  search  for  a  proof  wherever  possible.  We  have  devised  several  heuristics 
so  far  for  accomplishing  this. 

The  first  heuristic  is  to  prove  the  easiest,  most  specific  conjuncts  first,  and  then  to 
propagate  the  instantiations.  For  example,  in  the  domain  of  naval  operations  reports, 
words  like  “Lafayette”  are  treated  as  referring  to  classes  of  ships  rather  than  to  individual 
ships.  Thus,  in  the  sentence 

Lafayette  sighted. 

“Lafayette”  must  be  coerced  into  a  physical  object  that  can  be  sighted.  We  must  prove 
the  expression 

(3  X,  y)sight(z,  y)  A  re/(y,  x)  A  LafayetU^x) 

The  predicate  Lafayette  is  true  only  of  the  entity  LAFAYETTE-CLASS.  Thus,  rather 
than  trying  to  prove  rel(y,x)  first,  leading  to  a  very  explosive  search,  we  try  first  to 
prove  Lafayette{x).  We  succeed  immediately,  and  propagate  the  value  LAFAYETTE- 
CLASSiot  X.  We  thus  have  to  prove  rel^y^LAFAYETTE-CLASS).  Because  of  the  type  of 
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LAFAYETTE-CLASS,  oitly  one  axiom  applies,  namely,  the  one  allowing  coercions  from 
types  to  tokens  that  says  that  y  must  be  an  instance  of  LAFAYETTE-CLASS. 

Similar  heuristics  involve  solving  reference  problems  before  coercion  problems  and 
proving  conjnncts  whose  source  is  the  head  noun  of  a  noun  phrase  before  proving  conjuncts 
derived  from  adjectives. 

Another  heuristic  is  to  eliminate  assumptions  wherever  possible.  We  are  better  off 
if  at  any  node,  rather  than  having  dther  to  prove  an  atomic  formula  or  to  assume  it, 
we  only  have  to  prove  it.  Some  predicates  are  therefore  marked  as  nonassumable.  One 
category  of  such  predicates  is  the  ‘‘closed- world  predicates”,  those  predicates  such  that 
we  know  all  entities  of  which  the  predicate  is  true.  Predicates  representing  proper  names, 
such  as  Enterprise,  and  classes,  such  as  Lafayette,  are  examples.  We  don’t  assume  these 
predicates  because  we  know  that  if  they  are  true  of  some  entity,  we  will  be  able  to  prove 
it. 

Another  category  of  such  predicates  is  the  “schema-related”  predicates.  In  the  naval 
operations  domain,  the  task  is  to  characterize  the  participants  in  incidents  described  in 
the  message.  This  is  done  as  described  in  Section  5.7.  A  schema  is  encoded  by  means  of 
a  schema  predication,  with  an  argument  for  each  role  in  the  schema.  Lexical  realizations 
and  other  consequences  of  schemas  are  encoded  by  means  of  schema  axioms.  Thus,  in 
the  jargon  of  naval  operations  reports,  a  plane  can  splash  another  plane.  The  underlying 
schema  is  called  Init-Act.  There  is  thus  an  axiom 

(yx,y,...)Init-Aei{x,y,attaek,...)  D  splash{x,y) 

Schema-related  predicates  like  splash  occurring  in  the  lopcal  form  of  a  sentence  are  given 
very  large  assumption  costs,  effectively  preventing  their  being  assumed.  The  weight  asso¬ 
ciated  with  the  antecedent  of  the  schema  axioms  is  very  very  small,  so  that  the  schema 
predication  can  be  assumed  very  cheaply.  This  forces  backward-chaining  into  the  schema. 

In  addition,  in  the  naval  operations  application,  coercion  rdations  are  never  assumed, 
since  constraints  on  the  arguments  of  predicates  are  what  drives  the  use  of  the  type 
hierarchy. 

Factoring  also  multiplies  the  size  of  the  search  tree  wherever  it  can  occur.  As  explained 
above,  it  is  a  very  powerful  method  for  coreference  resolution.  It  is  based  on  the  principle 
that  where  it  can  be  inferred  that  two  entities  have  the  same  property,  there  is  a  good 
possibility  that  the  two  entities  are  identical.  However,  this  is  true  only  for  fairly  specific 
properties.  We  don’t  want  to  factor  predicates  true  of  many  things.  For  example,  to 
resolve  the  noun  phrase 

ships  and  planes 
we  need  to  prove  the  expression 

(3x,s\,y,Si)Plural{x,si)  A  ship{x)  A  Plural(y,S2)  A  p{ane(y) 

where  Plural  is  taken  to  be  a  relation  between  the  typical  element  of  a  set  and  the  set  itself. 
If  we  applied  factoring  indiscriminately,  then  we  would  factor  the  coiguncts  Plural(x,si) 
and  Plural(y,  sa),  identifying  x  with  y  and  si  with  sa.  If  we  were  lucky,  this  interpretation 
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would  be  rejected  because  of  a  type  violation — planes  aren’t  ships.  But  this  would  waste 
time.  It  is  more  reasonable  to  say  that  very  general  predicates  such  as  Plural  provide  no 
evidence  for  identity. 

The  type  hierarchy,  the  discipline  imposed  in  writing  axioms,  and  the  heuristics  for 
limiting  search  aU  make  the  system  less  powerful  than  it  would  otherwise  be,  but  we 
implement  these  techniques  for  the  sake  of  efficiency.  We  are  trying  to  locate  the  system 
on  a  scale  whose  extremes  are  efficiency  and  power.  Where  on  that  scale  we  achieve 
optimal  performance  is  a  matter  of  ongoing  investigation. 

8.2  Other  Pragmatics  Problems 

In  this  article  we.have  described  our  approach  to  the  problems  of  reference  resolution, 
compound  nomin^  interpretation,  lexical  and  syntactic  ambiguity,  metonymy  resolution, 
and  schema  recognition.  These  approaches  have  been  worked  out,  implemented,  and 
tested  on  a  fairly  large  scale.  We  intend  similarly  io  work  out  the  detsuls  of  an  abductive 
treatment  of  other  problems  in  discourse  interpretation.  Among  these  problems  are  the 
problems  of  metaphor  interpretation,  the  resolution  of  quantifier  scope  ambiguities,  and 
the  recognition  of  the  relation  between  the  utterance  and  the  speaker’s  plan.  Metaphor 
interpretation  is  discussed  in  Hobbs  (1991).  We  will  indicate  very  briefly  for  the  other  two 
problems  what  an  abductive  approach  might  look  like. 

Resolving  Quantifier  Scope  Ambiguities:  Hobbs  (1983b)  proposed  a  fiat  repre¬ 
sentation  for  sentences  with  multiple  quantifiers,  consisting  of  a  conjunction  of  atomic 
formulas,  by  admitting  variables  denoting  sets  and  typical  elements  of  sets,  where  the 
typical  elements  behave  essentially  like  reified  universally  quantified  variables,  similar  to 
McCarthy’s  (1977)  ‘‘inner  variables”.  Webber  (1978),  Van  Lehn  (1978),  Mellish  (1985), 
and  Fahlman  (1979)  have  all  urged  similar  approaches  in  some  form  or  other,  althou^ 
the  technical  details  of  such  nn  approach  are  by  no  means  easy  to  work  out.  (See  Shapiro, 
1980.)  In  such  an  approach,  the  initial  logical  form  of  a  sentence,  representing  all  that 
can  be  determined  from  syntactic  analysis  alone  without  recourse  to  world  knowledge,  is 
neutral  with  respect  to  the  various  possible  scopinp.  As  various  constraints  on  the  quanti¬ 
fier  structure  are  discovered  during  pragmatics  processing,  the  information  is  represented 
in  the  form  of  predications  expressing  ‘functional  dependence”  relations  among  sets  and 
their  typical  elements.  For  example,  in 

Three  women  in  our  group  had  a  baby  last  year. 

syntactic  analysis  of  the  sentence  tells  us  that  there  is  an  entity  w  that  is  the  typical 
element  of  a  set  of  women,  the  cardinality  of  which  is  three,  and  there  is  an  entity  b  that 
in  some  sense  is  a  baby.  What  needs  to  be  inferred  is  that  b  is  functionally  dependent  on 
w. 

In  an  abductive  framework,  what  needs  to  be  worked  out  is  what  mechanism  will 
be  used  to  infer  the  functional  dependency.  Is  it,  for  example,  something  that  must 
be  assumed  in  order  to  avoid  contradiction  when  the  main  predication  of  the  sentence 
is  assumed?  Or  is  it  something  that  we  somehow  infer  directly  from  the  propositional 
content  of  the  sentence.  The  problem  remains  to  be  worked  out. 
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It  may  also  be  that  if  the  quantifier  scoping  possibilities  were  built  into  the  grammar 
rules  in  the  integrated  approach  of  Section  6,  much  as  Montague  (1974)  did,  the  whole 
problem  of  determining  the  scopes  of  quantifiers  will  simply  disappear  into  the  larger 
problem  of  searching  for  the  best  interpretation,  just  as  the  problem  of  syntactic  ambiguity 
did. 

Recognizing  the  Speaker’s  Plsoi:  It  is  a  very  common  view  that  to  interpret  an 
utterance  is  to  discover  its  relation  to  the  speaker’s  presumed  plan,  and  on  any  account, 
discovering  this  relation  is  an  important  component  of  an  interpretation.  The  most  fun¬ 
damental  of  the  objections  that  Norvig  and  Wilensky  (1990)  raise  to  current  abductive 
approaches  to  discourse  interpretation  is  that  they  take  as  thdr  starting  point  that  the 
hearer  must  explain  why  the  utterance  is  true  rather  than  what  the  speaker  was  trying  to 
accomplish  with  it.  We  agree  in  part  with  this  criticism. 

Let  us  look  at  things  from  the  broadest  possible  context.  An  intelligent  agent  is 
embedded  in  the  world.  Just  as  a  hearer  must  explain  why  a  sequence  of  words  is  a 
sentence  or  a  coherent  text,  our  agent  must,  at  each  instant,  explain  why  the  complete 
set  of  observables  it  is  encountering  constitutes  a  coherent  situation.  Other  agents  in  the 
environment  are  viewed  as  intentional,  that  is,  as  planning  mechanisms,  and  that  means 
their  observable  actions  are  sequences  of  steps  in  a  coherent  plan.  Thus,  making  sense  of 
the  environment  entails  making  sense  of  other  agents’  actions  in  terms  of  what  they  are 
intended  to  achieve.  When  those  actions  are  utterances,  the  utterances  must  be  related  to 
the  goals  those  agents  are  trying  to  achieve.  That  is,  the  speaker’s  plan  must  be  recognized. 

Recognizing  the  speaker’s  plan  is  a  problem  of  abduction.  If  we  encode  as  axioms 
beliefs  about  what  kinds  of  actions  cause  and  enable  what  kinds  of  events  and  conditions, 
then  in  the  presence  of  complete  knowledge,  it  is  a  matter  of  deduction  to  prove  that  a 
sequence  or  more  complex  arrangement  of  actions  will  achieve  an  agent’s  goals,  pven  the 
agent’s  beliefs.  Unfortunately,  we  rarely  have  complete  knowledge.  We  will  almost  always 
have  to  make  assumptions.  That  is,  abduction  will  be  called  for.  To  handle  this  aspect  of 
interpretation  in  our  framework,  therefore,  we  can  take  it  as  one  of  our  tasks,  in  addition 
to  proving  the  logical  form,  to  prove  abductively  that  the  utterance  contributes  to  the 
achievement  of  a  goal  of  the  speaker,  within  the  context  of  a  coherent  plan.  In  the  process 
we  ought  to  find  ourselves  making  many  of  the  assumptions  that  hearers  make  when  they 
are  trying  to  "psych  out”  what  the  speaker  is  doing  by  means  of  his  or  her  utterance. 
Appelt  and  Pollack  (1990)  have  begun  research  on  how  weighted  abduction  can  be  used 
for  the  plan  ascription  problem. 

There  is  a  point,  however,  at  wluch  the  "intentional”  view  of  interpretation  becomes 
trivial.  It  tells  us  that  the  proper  interpretation  of  a  compound  nominal  like  "coin  copier” 
means  what  the  speaker  intended  it  to  mean.  This  is  true  enough,  but  it  offers  us  virtually 
no  assistance  in  determining  what  it  really  does  mean.  It  is  at  this  point  where  the 
"informational”  view  of  interpretation  comes  into  play.  We  are  working  for  the  most  part 
in  the  domain  of  common  knowledge,  so  in  fact  what  the  speaker  intended  a  sentence 
to  mean  is  just  what  can  be  proved  to  be  true  from  that  base  of  conunon  knowledge. 
That  is,  the  best  interpretation  of  the  sentence  is  the  best  explanation  for  why  it  would 
be  true,  given  the  speaker  and  hearer’s  common  knowledge.  So  while  we  agree  that  the 
intentional  view  of  interpretation  is  correct,  we  believe  that  the  informational  view  is  a 
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necessary  component  of  that,  a  component  that  moreover,  in  analyzing  long  written  texts 
and  monologues,  completely  overshadows  considerations  of  intention. 

Another  way  to  put  it  is  this.  We  need  to  figure  out  why  the  speaker  uttered  a  sequence 
of  words  that  conveyed  that  particular  content.  This  involves  two  parts,  the  informational 
aspect  of  figuring  out  what  the  particular  content  is,  and  the  intentional  aspect  of  figuring 
out  why  the  speaker  wished  to  convey  it.  In  this  paper  we  have  focused  on  the  former 
aspect.  We  are  now  working  on  an  approach  that  will  encompass  the  two.  In  such  a 
combined  approach,  we  should  be  able  to  interpret  ironic  statements  and  taut<fiogies,  for 
example,  from  intentional  considerations,  as  well  as  using  informational  considerations  to 
interpret  the  more  ordinary  sorts  of  discourse  discussed  in  this  article. 

8.3  What  the  Numbers  Mean 

The  problem  of  how  to  combine  symbolic  and  numeric  schemes  in  the  most  ^ective  way, 
exploiting  the  expressive  power  of  the  first  and  the  evaluative  power  of  the  second,  is  one 
of  the  most  significant  problems  that  faces  researchers  in  artificial  intelligence  today.  The 
abduction  scheme  we  have  presented  attempts  just  this.  However,  our  numeric  component 
is  highly  ad  hoc  at  the  present  time.  We  need  a  more  principled  account  of  what  the 
numbers  mean.  Here  we  point  out  several  possible  lines  of  investigation. 

Chamiak  and  Shimony  (1990)  have  proposed  a  probabilistic  semantics  for  weighted 
abduction  schemes,  under  several  simplifying  assumptions.  They  consider  only  the  propo¬ 
sitional  case,  so,  for  example,  no  factoring  or  equality  assumptions  are  needed.  From 
our  point  of  view,  this  is  not  a  limitation  in  thmr  account.  If  we  take  one  of  our  proofs, 
represented  by  a  directed  acyclic  graph  with  costs  attached,  each  node  or  literal  being 
different,  we  can  treat  it  as  propositional  with  variables  standing  for  unnamed  constants. 
Their  interpretation  of  the  costs  as  probabilities  would  iq>ply  to  this  proof,  and  we  could 
a  posteriori  interpret  the  proof  in  their  probabilistic  terms. 

They  also  make  the  simplifying  assumption  that  a  proposition  always  has  the  same  cost, 
wherever  it  occurs  in  the  inference  process,  although  rules  themsdves  may  also  have  an 
associated  cost.  They  concern  themselves  only  with  the  probability  that  the  propositions 
are  true,  and  do  not  try  to  incorporate  utilities  into  thdr  cost  functions  as  we  do.  This  is 
a  more  significant  simplification.  We  believe  we  benefit  from  flexible  assignment  of  costs 
to  goals,  their  propagation  by  wmghts,  and  their  sharing  by  factoring.  We  sometimes 
equate  high  assumption  cost  with  the  disutility  of  not  proving  something,  rather  than 
its  improbability.  For  example,  in  the  compound  nominal  problem,  we  strongly  believe 
the  nn  relations  are  true,  but  we  give  them  high  assumption  costs,  not  because  they  are 
improbable,  but  because  it  is  important  for  us  to  explain  rather  than  assume  them. 

Chamiak  and  Shimony  show  that  a  set  of  axioms  satisfying  thdr  restrictions  can  be 
converted  into  a  Bayesian  network  where  the  negative  logarithms  of  the  prior  probabilities 
of  the  nodes  are  the  assumability  costs  of  the  propositions.  They  then  show  that  the 
assignment  of  trath  values  to  the  nodes  in  the  Bayesian  network  with  maximum  probability 
given  the  evidence  is  equivalent  to  the  assignment  of  trath  values  to  the  propositions  that 
minimizes  cost. 

We  view  this  as  a  very  promising  start  toward  a  semantics  for  the  less  restricted 
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abduction  scheme  we  have  used. 

Let  us  turn  now  to  a  detailed  consideration  of  our  weighted  abduction  scheme.  We 
tend  to  agree  with  Chsimiak  and  Shimony  that  a  principled  approach  is  most  likely  to  be 
one  that  relies  on  probability.  But  what  is  the  space  of  events  over  which  the  probabilities 
are  to  be  calculated?  It  is  a  rather  glaring  problem  in  Goldman’s  (1990)  otherwise  very 
fine  work  that  he  bases  his  probabilities  on  occurrences  in  the  actual  world.  This  leads  to 
very  implausible  results.  Thus,  in 

John  wanted  to  hang  himself.  He  got  a  rope. 

the  probability  that  the  rope  implied  by  the  hanging  is  the  same  as  the  rope  mentioned 
in  the  second  sentence  is  taken  to  be  the  very  low  probability  that  two  randomly  selected 
ropes  in  the  real  world  would  be  identical.  The  problem  is  that  we  must  base  our  prob¬ 
abilities  not  on  occurrences  in  the  real  world  but  on  frequency  of  utilization  in  the  texts 
we  are  interpreting. 

Suppose  we  are  given  our  corpus  of  interest.  Imagine  that  a  TACITUS-system-in- 
the-sky  runs  on  this  entire  corpus,  interpreting  all  the  texts  and  instantiating  all  the 
abductive  inferences  it  has  to  draw,  producing  the  correct  proof  graphs.  This  ^ves  us  a 
set  of  propositions  Q  occurring  in  the  texts  and  some  propositions  P  assumed  or  drawn 
from  the  knowledge  base.  It  seems  reasonable  that  the  appropriate  probabilities  and 
conditional  probabilities  are  those  involving  instances  of  the  concepts  P  and  instances  of 
concepts  Q  in  this  space. 

Given  this  space  of  events,  let  us  examine  the  weights  in  our  abduction  scheme.  The 
first  question  is  how  the  weights  should  be  distributed  across  the  conjuncts  in  the  an¬ 
tecedents  of  Horn  clauses.  In  formula  (6),  repeated  here  for  convenience, 

(6)  A  D  Q 

one  has  the  feeling  that  the  weights  should  correspond  somehow  to  the  semantic  contri¬ 
bution  that  each  of  Pi  and  Pj  make  to  Q.  The  semantic  contribution  of  P  to  Q  may  best 
be  understood  in  terms  of  the  conditional  probability  that  an  instance  of  concept  Q  is  an 
instance  of  concept  P  in  the  space  of  events,  Pr{Q  |  P).  If  we  distribute  the  total  weight 
IP  of  the  antecedent  of  (6)  according  to  these  conditional  probabilities,  then  tPi  should 
vary  directly  with  w  and  with  Pr(Q  |  P),  normalized  somehow  by  the  combination  of 
Pr(Q  I  Pi)  and  Pr(Q  |  P3).  Fdlowii^  Chamiak  and  Shimony  in  interpreting  costs  as 
negative  logarithms  of  probabilities,  it  may  be  that  w,-  should  be  given  by  something  like 
the  formula 

The  next  question  is  what  the  total  uwight  on  the  antecedent  should  be.  To  address 
this  question,  let  us  suppose  that  all  the  axioms  have  just  one  cmyunct  in  the  antecedent. 
Then  we  consider  the  set  of  axioms  that  have  Q  as  the  concluncm: 

PT  DQ 
IT’  dQ 

ir 
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Intuitively,  the  price  we  will  have  to  pay  for  the  use  of  each  axiom  should  be  inversely 
related  to  the  likelihood  that  Q  is  true  by  virtue  of  that  axiom.  That  is,  we  want  to  look 
at  the  conditional  probability  that  Pi  is  true  given  Q,  Pr{Pi  |  Q).  The  weights  Wi  should 
be  ordered  in  the  reverse  order  of  these  conditional  probabilities.  We  need  to  include  in 
this  ordering  the  likelihood  of  Q  occurring  in  the  space  of  events  without  any  of  the  Pi's 
occurring,  Pr{->(Pi  A  ...  A  P/g)  |  Q),  to  take  care  of  those  cases  where  the  best  assumption 
for  Q  was  simply  Q  itself.  In  assigning  weights,  this  should  be  anchored  at  1,  and  the 
weights  Wi  should  be  assigned  accordingly. 

All  of  this  is  only  the  coarsest  pointer  to  a  serious  treatment  of  the  wdghts  in  terms 
of  probabilities. 

Appelt  (1990),  by  contrast,  is  exploring  an  approach  to  the  semantics  of  the  weights, 
based  not  on  probabilities  but  on  preference  relations  among  models,  as  Shoham  (1987) 
has  done  for  nonmonotonic  lo^cs.  Briefly,  when  we  have  two  axioms  of  the  form 

Pp  DQ 
P^  DQ 

where  ioi  is  less  than  103,  we  take  this  to  mean  that  every  model  in  which  Pi,  Q,  and  -«P2 
are  true  is  preferred  over  some  model  in  which  P2,  Q,  and  -iPi  are  true.  Appelt’s  approach 
exposes  problems  of  unintended  side^ects.  Elsewhere  among  the  axioms,  P2  may  entail 
a  highly  preferred  proposition,  even  though  w^  is  larger  than  wi.  To  get  around  this 
problem,  Appelt  must  place  very  tight  global  constraints  on  the  assignment  of  wmghts. 
This  diflUculty  may  be  fundamental,  resulting  from  the  fact  that  the  abduction  scheme 
attempts  to  make  global  judgments  on  the  basis  of  strictly  local  information. 

So  far  we  have  only  talked  about  the  semantics  of  the  weights,  and  not  the  costs.  Hasida 
(personal  communication)  has  suggested  that  the  costs  and  wmghts  be  viewed  along  the 
lines  of  an  economic  model  of  supply  and  demand.  The  requirement  to  interpret  texts 
creates  a  demand  for  proportions  to  be  proved.  The  costs  reflect  that  demand.  Those 
most  likely  to  anchor  the  text  referentially  are  the  ones  that  are  in  the  greatest  demand; 
therefore,  they  cost  the  most  to  assume.  The  supply,  on  the  other  hand,  corresponds  to 
the  probability  that  the  propositions  are  true.  The  more  probable  the  proposition,  the 
less  it  should  cost  to  assume,  hence  the  smallm'  the  weight. 

A  further  requirement  for  the  scoring  scheme  is  that  it  incorporate  not  only  the  costs 
of  assumptions,  but  also  the  costs  of  inference  steps,  where  highly  salient  inferences  cost 
less  than  inferences  of  low  salience.  The  obvious  way  to  do  this  is  to  assodate  costs  with 
the  use  of  each  axiom,  where  the  costs  are  based  on  the  axiom’s  salience,  and  to  levy  that 
cost  as  a  charge  for  each  proof  step  involving  the  axiom.  B  we  do  this,  we  need  a  way 
of  correlating  the  cost  of  inference  steps  with  the  cost  of  assumptions;  there  must  be  a 
common  coin  of  the  realm.  In  order  to  relate  assumption  costs  and  inference  costs,  two 
moves  are  called  for:  interpreting  the  cost  of  inference  as  uncertainty  and  interpreting 
salience  as  truth  in  a  local  theory. 

The  first  move  is  to  recognize  that  virtually  all  of  our  knowledge  is  uncertain  to  some 
degree.  Then  we  can  view  the  cost  of  using  an  axiom  to  be  a  result  of  the  greater  un¬ 
certainty  that  is  introduced  by  assuming  that  axiom  is  true.  This  can  be  done  with  “et 
cetera”  propositions,  dther  at  the  level  of  the  axiom  as  a  whole  or  at  the  level  of  its 
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instantiations.  To  associate  the  cost  with  the  general  axiom,  we  can  write  our  axioms  as 
follows: 

(Vi)[p(x)  A  ctcf“  D  fl(i)] 

That  is,  there  is  no  dependence  on  x.  Then  we  can  use  any  number  of  instances  of  the 
axiom  once  we  pay  the  price  ci .  To  associate  the  cost  with  each  instantiation  of  the  axiom, 
we  can  write  our  axioms  as  follows: 

(V*)[p(x)  A  etciC®)**!  D  ?(*)] 

Here  we  must  pay  the  price  of  ci  for  every  instance  of  the  axiom  we  use.  The  latter  style 
seems  more  reasonable. 

Furthermore,  it  seems  reasonable  not  to  charge  for  multiple  uses  of  particular  instan¬ 
tiations  of  axioms;  we  need  to  pay  for  etei{A)  only  once  for  any  given  A.  This  intuition 
supports  the  uncertainty  interpretation  of  inference  costs. 

It  is  easy  to  see  how.  a  salience  measure  can  be  implemented  in  this  scheme.  Less 
salient  axioms  have  higher  associated  cos«.s  ci.  These  costs  can  be  changed  from  situation 
to  situation  if  we  take  the  cost  ci  to  be  not  a  constant  but  a  fnnction  that  is  sensitive 
somehow  to  the  contextual  factors  affecting  the  salience  of  different  clusters  of  knowledge. 
Alternatively,  if  axioms  are  grouped  into  clusters  and  tagged  with  the  cluster  they  belong 
to,  as  in 

(Vx)p(z)  A  duster*^*  D  q(x) 

then  whole  clusters  can  be  moved  from  low  salience  to  high  salience  by  paying  the  cost 
Ici  of  the  ‘‘proposition”  duster  exactly  once.  This  axiom  may  be  read  as  saying  that  if  p 
is  true  of  x  and  the  cluster  of  facts  duster  is  rdevant,  then  q  is  true  of  x. 

We  suspect  this  use  of  the  costs  can  also  be  interpreted  as  a  measure  of  uncertainty, 
based  on  ideas  discussed  in  Hobbs  (1985c).  There  it  is  argued  that  whenever  intelligent 
agents  axe  interpreting  and  acting  in  specific  environments,  they  are  ddng  so  not  on  the 
basis  of  everything  they  know,  their  entire  knowledge  base,  but  rather  on  the  basis  of 
local  theories  that  are  already  in  place  or  that  are  constructed  somehow  for  the  occasion 
for  reasoning  about  such  ntuations.  At  its  umplest,  a  local  thewy  is  a  rdativdy  small 
subset  of  the  entire  knowledge  base;  more  complex  versions  are  also  imapnable,  in  which 
axioms  are  modified  in  some  way  for  the  local  theory.  In  this  view,  a  local  theory  creates 
a  binary  distinction  between  the  axioms  that  are  true  in  the  local  theory  and  the  axioms 
in  the  global  theory  that  are  not  necessarily  true.  However,  in  the  abdnctive  framework, 
the  local  theory  can  be  given  a  graded  edge  by  assigning  values  to  the  costs  ci  in  the 
right  way.  Thus,  highly  salient  axioms  will  be  in  the  core  of  the  local  theory  and  will  have 
relatively  low  costs.  Low-salience  axioms  will  be  ones  for  which  there  is  a  great  deal  of 
uncertainty  as  to  whether  they  are  relevant  to  the  pv«t  situation  and  thus  whether  they 
should  actually  be  true  in  the  local  theory;  they  will  have  relativdy  high  costs.  Salience 
can  thus  be  seen  as  a  measure  of  the  certainty  that  an  axiom  is  true  in  the  local  theory. 

Josephson  et  al.  (1987)  have  argued  that  an  evaluation  sdieme  must  consider  the 
following  criteria  when  choosing  a  hypothesis  H  to  explain  some  data  D: 


1.  How  decisively  does  H  surpass  its  alternatives? 

2.  How  good  is  H  by  itself,  independent  of  the  alternatives? 

3.  How  thorough  was  the  search  for  alternatives? 

4.  What  are  the  risks  of  being  wrong  and  the  benefits  of  bang  right? 

5.  How  strong  is  the  need  to  come  to  a  conclusion  at  all? 

Of  these,  our  abduction  scheme  uses  the  weights  and  costs  to  formalize  criterion  2,  and  the 
costs  at  least  in  part  address  criteria  4  and  5.  Criterion  3  is  addressed  in  the  TACITUS 
system  in  that  a  much  deeper  search  is  generally  conducted  for  a  first  proof  than  for 
subsequent  proofs.  But  criterion  1  is  not  accommodated  at  this  time.  The  fact  that  our 
abduction  scheme  does  not  take  into  account  the  competing  possible  interpretations  is  a 
clear  shortcoming  that  needs  to  be  corrected. 

A  theoretical  account,  such  as  the  one  we  have  sketched,  can  inform  our  intuitions,  but 
in  practice  we  can  only  assign  weights  and  costs  by  a  rough,  intuitive  sense  of  semantic 
contribution,  importance,  and  so  on,  and  refine  them  by  successive  approodmation  on  a 
representative  sample  of  the  corpus.  But  the  theoretical  account  would  at  least  give  us  a 
clear  view  of  what  the  approximations  are  approximating. 

9  Conclusion 

Interpretation  in  general  may  be  viewed  as  abduction.  When  we  look  out  the  window 
and  see  a  tree  waving  back  and  forth,  we  normally  assume  the  wind  is  blowing.  There 
may  be  other  reasons  for  the  tree’s  motion;  for  example,  someone  below  window  level 
might  be  shaking  it.  But  most  of  the  time  the  most  economical  explanation  coherent 
with  the  rest  of  what  we  know  will  be  that  the  wind  is  blowing.  TUs  is  an  abductive 
explanation.  Moreover,  in  much  the  same  way  as  we  try  to  exploit  the  redundancy  in 
natural  language  discourse,  we  try  to  minimize  our  explanations  for  the  situations  we 
encounter  by  identifying  disparately  presented  entities  with  each  other  wherever  possible. 
If  we  see  a  branch  of  a  tree  occluded  in  the  middle  by  a  tdephone  p<de,  we  assume  that  there 
is  indeed  just  one  branch  and  not  two  branches  twisting  bizairdy  behind  the  telephone 
pole.  If  we  hear  a  loud  ncnse  and  the  lights  go  out,  we  assume  one  event  h^pened  and 
not  two. 

These  observations  make  the  abductive  approach  to  discourse  interpretation  more  ap¬ 
pealing.  Discourse  interpretation  is  seen,  as  it  ought  to  be  seen,  as  just  a  special  case  of 
interpretation.  IVom  the  viewpdnt  of  Section  6.3,  to  interpret  a  text  is  to  prove  abdnc- 
tively  that  it  is  coherent,  where  part  of  what  coherence  is  is  an  explanation  for  why  the 
text  would  be  true.  Similarly,  one  could  argue  that  faced  with  any  scene  or  other  situation, 
we  must  prove  abductively  that  it  is  a  coherent  situation,  where  part  of  what  coherence 
means  is  explaining  why  the  situation  exists.** 

The  particular  abduction  scheme  we  use,  or  rather  the  ultimate  abduction  scheme 
of  which  our  scheme  is  an  initial  versim,  has  a  number  of  other  attractive  properties. 

**Tkii  viewpoiat  leads  oee  to  saspact  tkat  the  bc^  is,  at  least  ia  part,  a  laite  aad  ooasplex  abdaetkm 
machiae. 
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It  gives  us  the  expressive  power  of  predicate  logic.  It  allows  the  defeasible  reasoning  of 
nonmonotonic  logics.  Its  numeric  evaluation  method  bepns  to  give  reasoning  the  "soft 
comers”  of  neural  nets.  It  provides  a  framework  in  which  a  number  of  traditionally  difficult 
problems  in  pragmatics  can  be  formulated  elegantly  in  a  uniform  manner.  Finally,  it 
gives  us  a  framework  in  which  many  types  of  linguistic  processing  can  be  formalized  in  a 
thoroughly  integrated  fashion. 
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